E. Hairer 
G.Wanner 


UNDERGRADUATE TEXTS IN MATHEMATICS 


Analysis 
by Its History 


Springer 



Undergraduate Texts in Mathematics 

Readings in Mathematics 

Editors 

S. Axler 
K.A. Ribet 



Graduate Texts in Mathematics 

Readings in Mathematics 


Ebbinghaus/Hermes/Hirzebruch/Koecher/Mainzer/Neukirch/Prestel/Remmert: Numbers 

Fulton/Harris: Representation Theory: A First Course 

Murty: Problems in Analytic Number Theory 

Remmert: Theory of Complex Functions 

Walter: Ordinary Differential Equations 


Undergraduate Texts in Mathematics 

Readings in Mathematics 

Anglin: Mathematics: A Concise History and Philosophy 
Anglin/Lambek: The Heritage of Thales 
Bressoud: Second Year Calculus 
Hairer/Wanner: Analysis by Its History 
Håmmerlin/Hoffmann: Numerical Mathematics 
Isaac: The Pleasures of Probability 

Knoebel/Laubenbacher/Lodder/Pengelley: Mathematical Masterpieces: Further Chronicles 
by the Explorers 

Laubenbacher/Pengelley: Mathematical Expeditions: Chronicles by the Explorers 
Samuel: Projective Geometry 
Stillwell: Numbers and Geometry 

Toth: Glimpses of Algebra and Geometry, Second Edition 



E. Hairer G. Wanner 


Analysis by 
Its History 


Springer 



Editors 


E. Hairer 
G. Wanner 

Department of Mathematics 
University of Geneva 
Geneva, Switzerland 


Editorial Board 
S. Axler 

Mathematics Department 
San Francisco State 
University 

San Francisco, CA 94132 
USA 

axler@sfsu.edu 


K.A. Ribet 

Mathematics Department 
University of Califomia 
at Berkeley 

Berkeley, CA 94720-3840 
USA 

ribet@math.berkeley.edu 


ISBN: 978-0-387-77031-4 e-ISBN: 978-0-387-77036-9 

Library of Congress Control Number: 2008925883 
© 2008 Springer Science+Business Media, LLC 

All rights reserved. This work may not be translated or copied in whole or in part without the written 
permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, 
NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use 
in connection with any form of information storage and retrieval, electronic adaptation, computer 
software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. 
The use in this publication of trade names, trademarks, service marks, and similar terms, even if they 
are not identified as such, is not to be taken as an expression of opinion as to whether or not they are 
subject to proprietary rights. 

Printed on acid-free paper 
9876543 21 


springer.com 



Preface 


. . . that departed from the traditional dry-as-dust mathematics textbook. 

(M. Kline, from the Preface to the paperback edition of Kline 1972) 

Also for this reason, I have taken the trouble to make a great number of 
drawings. (Brieskom & Knorrer, Plane algebraic curves, p. ii) 


... I should like to bring up again for emphasis . . . points, in which my 
exposition differs especially from the customary presentation in the text- 
books: 

1. Illustration of abstract considerations by means of figures. 

2. Emphasis upon its relation to neighboring fields, such as calculus of dif- 
ferences and interpolation . . . 

3. Emphasis upon historical growth. 

It seems to me extremely important that precisely the prospective teacher 
should take account of all of these. (F. Klein 1908, Engl. ed. p. 236) 


Traditionally, a rigorous first course in Analysis progresses (more or less) in the 
following order: 


sets, 

mappings 


limits, 

continuous 

functions 


derivatives 


integration. 


On the other hånd, the historical development of these subjects occurred in reverse 
order: 


Cantor 1875 Cauchy 1821 

Dedekind Weierstrass 


Newton 1665 
Leibniz 1675 


Archimedes 
Kepler 1615 
Fermat 1638 


In this book, with the four chapters 


Chapter I. 
Chapter II. 
Chapter III. 
Chapter IV. 


Introduction to Analysis of the Infinite 
Differential and Integral Calculus 
Foundations of Classical Analysis 
Calculus in Several Variables, 


we attempt to restore the historical order, and begin in Chapter I with Cardano, 
Descartes, Newton, and Euler’s famous Introductio. Chapter II then presents 17th 
and 18th century integral and differential calculus “on period instruments” (as a 
musician would say). The creation of mathematical rigor in the 19th century by 
Cauchy, Weierstrass, and Peano for one and several variables is the subject of 
Chapters III and IV. 

This book is the outgrowth of a long period of teaching by the two authors. 
In 1968, the second author lectured on analysis for the first time, at the University 
of Innsbruck, where the first author was a first-year student. Since then, we have 
given these leetures at several universities, in German or in French, influenced by 
many books and many fashions. The present text was finally written up in French 
for our students in Geneva, revised and corrected each year, then translated into 
English, revised again, and corrected with the invaluable help of our colleague 
John Steinig. He has corrected so many errors that we can hardly imagine what 
we would have done without him. 





Numbering: each chapter is divided into sections. Formulas, theorems, fig- 
ures, and exercises are numbered consecutively in each section, and we also in- 
dicate the section number, but not the chapter number. Thus, for example, the 
7th equation to be labeled in Section IL 6 is numbered “(6.7)”. References to this 
formula in other chapters are given as “(II.6.7)”. 

References to the bibliography: whenever we write, say, “Euler (1737)” or 
“(Euler 1737)”, we refer to a text of Euler’s published in 1737, detailed references 
to which are in the bibliography at the end of the book. We occasionally give more 
precise indications, as for instance “(Euler 1737, p. 25)”. This is intended to help 
the reader who wishes to look up the original sources and to appreciate the often 
elegant and enthusiastic texts of the pioneers. When there is no corresponding 
entry in the bibliography, we either omit the parentheses or write, for example, 
“(in 1580)”. 

Quotations: we have included many quotations from the literature. Those ap- 
pearing in the text are usually translated into English; the non-English originals 
can be consulted in the Appendix. They are intended to give the flavor of math- 
ematics as an international science with a long history, sometimes to amuse, and 
also to compensate those readers without easy access to a library with old books. 
When the source of a quotation is not included in the bibliography, its title is indi- 
cated directly, as for example the book by Brieskorn and Knorrer from which we 
have quoted above. 

Acknowledgments: the text was processed in plain TgX on our Sun work- 
stations at the University of Geneva using macros from Springer- Verlag New 
York. We are grateful for the help of J.M. Naef, “Mr. Sun” of the “Services In- 
formatiques” of our university. The figures are either copies from old books (pho- 
tographed by J.M. Meylan from the Geneva University Library and by A. Perru- 
choud) or have been computed with our Fortran codes and included as Postscript 
files. The final printing was done on the 1200dpi laser printer of the Psychology 
Department in Geneva. We also thank the staff of the mathematics department 
library and many colleagues, in particular R. Bulirsch, P. Deuflhard, Ch. Lubich, 
R. Marz, A. Ostermann, J.-Cl. Pont, and J.M. Sanz-Sema for valuable comments 
and hints. Last but surely not least we want to thank Dr. Ina Lindemann and her 
équipe from Springer- Verlag New York for all her help, competent remarks, and 
the agreeable collaboration. 

March 1995 E. Hairer and G. Wanner. 

Preface to the 2nd, 3rd, and 4th Corrected Printings. These new printings al- 
lowed us to correct several misprints and to improve the text in many places. In 
particular, we give a more geometric exposition of Tartaglia’s solution of the cubic 
equation, improve the treatment of envelopes, and give a more complete proof of 
the transformation formula of multiple integrals. We are grateful to many students 
and colleagues who have helped us to discover errors and possible improvements, 
in particular R.B.Burckel, H. Fischer, J.-L. Gaudin, and H.-M.Maire. We would 
like to address special thanks to Y. Kanie, the translator of the Japanese edition. 
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I 

Introduction to Analysis of the Infinite 


. . . our students of mathematics would profit much more from a study 
of Euler’s Introductio in Analysin Infinitorum, rather than of the available 
modem texthooks. 

(André Weil 1979, quoted by J.D. Blanton 1988, p. xii) 

. . . since the teacher was judicious enough to allow his unusual pupil (Ja- 
cobi) to occupy himself with Euler’s Introductio, while the other pupils 
made great efforts .... (Dirichlet 

1852, speech in commemoration of Jacobi, in Jacobi’s Werke, vol. I, p. 4) 


This chapter explains the origin of elementary functions and the impact of Des- 
cartes’s “Géométrie” on their calculation. The interpolation polynomial leads to 
Newton’s binomial theorem and to the infinite series for exponential, logarith- 
mic, and trigonometric functions. The chapter ends with a discussion of complex 
numbers, infinite products, and continued fractions. The presentation follows the 
historical development of this subject, with the mathematical rigor of the period. 
The justification of dubious conclusions will be an additional motivation for the 
rigorous treatment of convergence in Chapter III. 

Large parts of this chapter — as well as its title — were inspired by Euler’s 
Introductio in Analysin Infinitorum (1748). 
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1. 1 Cartesian Coordinates and Polynomial Functions 


As long as Algebra and Geometry were separated, their progress was slow 
and their use limited; but once these Sciences were United, they lent each 
other mutual support and advanced rapidly together towards perfection. We 
owe to Descartes the application of Algebra to Geometry; this has become 
the key to the greatest discoveries in all fields of mathematics. 

(Lagrange 1795, Oeuvres, vol. 7, p. 271) 

Greek civilization produced the first great flowering of mathematical talent. Start- 
ing with Euclid’s era (~ 300 B.C.), Alexandria became the world center of sci- 
ence. The city was devastated three times (in 47 B.C. by the Romans, in 392 by 
the Christians, and finally in 640 by the Moslems), and this led to the decline of 
this civilization. Following the improvement of Arabic writing (necessary for the 
Koran), Arab writers eagerly translated the surviving fragments of Greek works 
(Euclid, Aristotle, Plato, Archimedes, Apollonius, Ptolemy), as well as Indian 
arithmeticians, and started new research in mathematics. Finally, during the Cru- 
sades (1 100-1300), the Europeans discovered this civilization; Gerard of Cremona 
(1114-1187), Robert of Chester (Xllth century), Leonardo da Pisa (“Fibonacci”, 
around 1200) and Regiomontanus (1436-1476) were the main translators and the 
first scientists of Western Europe. 

At that time, mathematics were clearly separated: on one side algebra, on the 
other geometry. 


Algebra 


Diophantus can be considered the inventor of Aigebra; . . . 

(Lagrange i795, Oeuvres, vol. 7, p. 219) 

Algebra is a heritage from Greek and Oriental antiquity. The famous book Al-jabr 
w’al muqåbala by Mohammed ben Musa Al-Khowårizmi 1 (A.D. 830) starts by 
dealing with the solution of quadratic equations. The oldest known manuscript 
dates from 1342 and begins as follows: 2 



1 The words “algebra” and “algorithm” originate from Al-jabr and Al-Khowårizmi, respec- 

2 This picture as well as Figs. 1.1 and 1.2 are reproduced with permission of the Bodleian 
Library, University of Oxford, Ms. Huntington 214, folios IR, 4R and 4V. English trans- 
lation: F. Rosen (1831). 
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Al-Khowårizmi’s Examples. Consider the quadratic equation 

(1.1) x 2 + 10a; = 39. 


Such an equation hides the unknown solution x which is called by the arabs dshidr 
(root), a word that originally stood for the side of a square of a given surface (“A 
root is any quantity which is to be multiplied by itself”, F. Rosen 1831, p. 6). 


—•i 



Manuscript of 1342 Modem Drawing 

FIGURE 1.1. Solution of x 2 + Wx = 39 


Solution. Al-Khowårizmi sketches a square of side x to represent x 2 and two 
rectangles of sides 5 and x for the term lOæ (see Fig. 1.1). Equation (1.1) shows 
that the shaded region of Fig. 1.1 is 39; consequently, the area of the whole square 
is 39 + 25 = 64 = 8 • 8, thus 5 + x = 8 and x = 3. 


'-Jr*- 


r ; 


.i 


r r k) 

^oi 



Manuscript of 1342 


Modern Drawing 


FIGURE 1.2. Solution of x 2 + 21 = Ilte 


With a second example (from Al-Khowårizmi), 

(1.2) z 2 + 21 = l(tø 

(or, if you prefer the Latin of Robert of Chester’s translation: “Substancia vero et 
21 dragmata 10 rebus equiparantur”), we demonstrate that different signs require 
different figures. To obtain its solution we sketch a square for x 2 and we attach 
a rectangle of width x and of unknown length for the 21 (Fig. 1.2). Because of 

(1.2) , the total figure has length 10. It is split in the middle and the small rectangle 
(A) contained between x 2 and the bisecting line is placed on top (B). This gives a 
figure of height 5. The gray area is 21 and the complete square (gray and black) is 
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5 • 5 = 25. Consequently, the small black square must be 25 — 21 = 4 = 2 • 2 and 
we obtain x = 3. Using a similar drawing (you can have a try), Al-Khowårizmi 
also linds the second solution x = 7. 

Mohammed ben Musa Al-Khowårizmi describes his solution as follows 
(Rosen 1831, p. 11): 

... for instance, “a square and twenty-one in numbers are equal to ten roots of the same 
square.” That is to say, what must be the amount of a square, which, when twenty-one 
dirhems are added to it, becomes equal to the equivalent of ten roots of that square? Solu- 
tion: Halve the number of the roots; the moiety is five. Multiply this by itself; the product 
is twenty-five. Subtract from this the twenty-one which are connected with the square; the 
remainder is four. Extract its root; it is two. Subtract this from the moiety of the roots, 
which is live; the remainder is three. This is the root of the square which you required, and 
the square is nine. Or you may add the root to the moiety of the roots; the sum is seven; 
this is the root of the square which you sought for, and the square itself is forty-nine. 

As an application, Al-Khowårizmi solves the following puzzle: “I have di- 
vided 10 into two parts, and multiplying one of these by the other, the result was 
21”. Putting for one of the two parts x and the other 10 — x, and multiplying them, 
we obtain 

(1.3) x ■ (10 — x) = 21 

which is equivalent to (1.2). Hence, the solution is given by the two roots of 
Eq. (1.2), i.e., 3 and 7 or vice versa. 

The Solution for Equations of Degree 3. 

Tartalea presented his solution in bad italian verse . . . 

(Lagrange 1795, Oeuvres , vol. 7, p. 22) 
... I have discovered the general rute, but for the moment I want to keep it 
secret for several reasons. 

(Tartaglia 1530, see M. Cantor 1891, vol. II, p. 485) 

For example, let us try to solve 

(1.4) x 3 +6x = 20, 

or, in “bad” italian verse, “Quando che’l cubo con le cose appresso, Se agguaglia 
å qualche numero discreto ...” (see M. Cantor 1891, vol. II, p.488). Nicolo 
Tartaglia (1499-1557) and Scipione dal Ferro (1465-1526) found independently 
the method for solving the problem, but they kept it secret in order to win com- 
petitions. Under pressure, and lured by false promises, Tartaglia divulged it to 
Gerolamo Cardano (1501-1576), veiled in verses and without derivation (“sup- 
pressa demonstratione”). Cardano reconstructed the derivation with great diffi- 
culty (“quod difficillimum fuit”) and published it in his “Ars Magna” 1545 (see 
also di Pasquale 1957, and Struik 1969, p. 63-67). 

Derivation. We represent x 3 by a cube with edges of length x (what else?, gray in 
Fig. 1.3a); the term 6x is attached in the form of 3 square prisms of volume x 2 v 
and three of volume xv 2 (white in Fig. 1.3a). We obtain a body of volume 20 (by 

(1.4) ) which is the difference of a cube u 3 and a cube v 3 (see Fig. 1.3a), i.e., 

u 3 - v 3 = 20, 
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FTGURE 1.3 a. Cubic equation (1.4) 



FIGURE 1.3b. Justification of (1.6) 


frizftimado.Excniplum.cubus & 6 pofi* 
riones,xquantur 2 o, ducito 2 , ternam par* 
cubum,fit 8<duc lo dtmidmm nu 
mm in fc,fit ioo,iunge ioo & g,(itt io8,acd 
pr radicrm qug eft Ri i o8,& eam gcmfnina 
bis, alten addes io,dimidium numeri.ab 
iltcrominucs tantundcm,habebis Bino« 
miumRJio8p:io,&ÅpotomcnR:io8 m: 
io,horum acdpe Ri" cub" & minue illam 


cub’ p:éreb’gcjlis 20 
2 20 

8 10 

t«8 

j*i6ftp:t<> 
Rrto8m:i& 
teV:cu.Riio8p:<o 
m:R:v:cu.Rtio8 m:t o 


FIGURE 1.3c. Extract from Cardano, Ars Magna 1545, ed. Basilea 1570 3 


where 

(1.5) u = x + v. 

Arranging the six new prisms as in Fig. 1.3b, we see that their volume is equal to 
6æ (what is required) if 

(1.6) 3 uvx = 6x or uv = 2. 

We now know the sum (= 20) and the product (= —8) of u 3 and —v 3 and can 
thus reconstruct these two numbers, as in Al-Khowårizmi’s puzzle (1.3), as 

u 3 = 10 + ^108, - v 3 = 10 - 7108. 

Taking then cube roots and using x = u — v we obtain (see the facsimile in 
Fig. 1.3c) 

(1.7) x = ^108+10 - y/y/M- 10. 


3 Reproduced with permission of Bibi. Publ. Univ. Genéve. 
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Some years later a method of solving equations of degree 4 was found (Lu- 
dovico Ferrari, see Struik 1969, p. 69f, and Exercises 1.1 and 1.2); the equation of 
degree 5 remained a mystery for centuries, until Abel’s proof about the impossi- 
bility of solutions by radicals in 1826. 


“Algebra Nova” 


The Numerical Logistic is the one displayed and treated by numbers; the 
Specific is displayed by kinds or forms of things: as by the letters of the 
Alphabet. (Viéte 1600, Algebra nova, French edition 1630) 

ALGEBRA is a general Method of Computation by certain Signs and Sym- 
bols which have been contrived for this Purpose, and found convenient. 

(Maclaurin 1748, A Treatise of Algebra, p. 1) 

The ancient texts dealt only with particular examples and their authors carried 
out “arithmetical” calculations using only numbers. Fran§ois Viéte (= Franciscus 
Vieta 1540-1603, 1591 In artem analyticam isagoge, 1600 Algebra nova ) had the 
fundamental idea of writing letters A, B,C,X,... for the known and unknown 
quantities of a problem (often geometric) and to use these letters for algebraic 
calculations (see the facsimile in Fig. 1.4a). Since no problem of the Greek era 
appeared to resist the method 


put letters calculations 


Geometrical 

1 

Algebraic 

^ ; | Solution | 

Problem 


Problem 



Viéte wrote in Capital letters “NVLLVM NON PROBLEMA SOLVERE” (Le., 
“GIVING SOLVTION TO ANY PROBLEM”). The perfection of this idea led to 
Descartes’s “Geometry”. 


Exemple. 

Qu’il faille adjoufter A+D,»uec B+ i D,la 
lomme fera A f B f j D, obfcruant ec qui a ehé 
drc. 


B + iD. 
A + D. 


A + B + 3 D- 


FIGURE 1.4a. Facsimile of the French edition (1630) of Viéte (1600) 4 


Reproduced with permission of Bibi. Publ. Univ. Genéve. 
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S i A quad. -+ B i in A,æquetur 2 plano. A -+ B efto E. Igicur E quad. 
xquabitur Z plano -+ Bquad. 

Confcdarium. 

Itaque, -b,iu.> 4 — B fit A, de quaprimumquærebacur. 

Itaque fi Acubus — B plano ; in A.xqueturZ foIido2. 

+- «/c.zfoiidi- t< _ B Eft , 

de qua quxricur. 


FIGURE 1.4b. Extracts of Viéte (1591a f (Opera p. 129 and 150); Solution of A 2 + 2 BA = 
Z and A 3 - 3 BA = 2 Z 


Example. (Trisection of an angle). The famous clas- 
sical problem “Datum angulum in tres partes æquales 
secare” becomes, with the help of 

(1.8) sin(3a) = 3 sin a cos 2 a — sin 3 a 
(see (4. 14) below) and of some simple calculations, the 
algebraic equation 

(1.9) -4X 3 + 3 X = B 
(see Viéte 1593, Opera , p. 290). Its solution is obtained from (1.14) below. 
Formula for the Equation of Degree 2. In Viéte’s notation, the complicated text 
by Al-Khowårizmi (see p. 4) becomes the “formula” 

(1.10) x 2 +ax + b = 0 =>• X\,X 2 = — a/2 ± \fa 2 /A — b. 

Formula for the Equation of Degree 3. 

o o v + a/3 = x Q 

(1.11) y A + ay 2 + by + c = 0 ==> x å +px + q = 0. 

We set x = u + v (this corresponds to (1.5) with “—v” replaced by “ v ”), so that 
Eq. (1.11) becomes 

(1.12) u 3 +v 3 + (3uv + p)(u + v) + q = 0. 

Putting uv = —p/3 (this corresponds to (1.6)), we obtain 

(1.13) u 3 + v 3 = -q, u 3 v 3 = -p 3 / 27. 

By Al-Khowårizmi ’s puzzle (1.3) and formulas (1.10), we get (see the facsimile 
in Fig. 1.4b), 

(1.14) x = ^-9/2+7^74+^3/27+ ^-9/2-7^74+^3/27. 

5 Reproduced with permission of Bibi. Publ. Univ. Genéve. Here, the unknown variable is 
A. Only with Descartes came into use the choice of x. y, z for unknowns. 
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Descartes’s Geometry 

Here I beg you to observe in passing that the scruples that prevented ancient 
writers from using arithmetical terms in geometry, and which can only be a 
consequence of their inability to perceive clearly the relation between these 
two subjects, introduced much obscurity and confusion into their explana- 
tions. (Descartes 1637) 

Geometry, the gigantic heritage of Greek antiquity, was brought to Europe thanks 
to the Arabic translations. 

For example, Euclid’s Elements (around 300 B.C.) consist of 13 “Books” 
containing “Definitions”, “Postulates”, in all 465 “Propositions”, that are rigor- 
ously proved. The Conics by Apollonius (200 B.C.) are of equal importance. 

Nevertheless, different unsolved problems eluded the efforts of these scien- 
tists: trisection of the angle, quadrature of the circle, and the problem mentioned 
by Pappus (in the year 350), which inspired Descartes’s research. 

Problem by Pappus. (“The question, then, the solution of which was begun by 
Euclid and carried farther by Apollonius, but was completed by no one, is this”): 
Let three straight lines a, b, c and three angles a, /3 , 7 be given. For a point C, 
arbitrarily chosen, let B, D, F be points on a, b, c such that CB, CD, CF form with 
o, b, c the angles a, (3, 7 , respectively (see Figs. 1.5a and 1.5b). We wish to find 
the locus of points C for which 

(1.15) CB • CD = (CF) 2 . 

Descartes solved this problem using Viéte’s “new” and prestigious algebra; the 
point C is determined by the distances AB and BC. These two “unknown values” 
are denoted by the letters “x” and “y” (“Que le segment de la ligne AB, qui est 
entre les points A & B, soit nommé x. & que BC soit nommé y'\) 

For the moment, consider only two of these straight lines (Fig. 1 .5c) (“& pour 
me demesler de la cofusion de toutes ces fignes . . .”). We draw the parallel to EF 
passing through C. All angles being given, we see that there are constants K\ and 
K‘2 such that 

u= K, CF. v = K 2 -y. 

As AE = x + u + v = K 3 , we get 

(1.16) CF = d + ix + ky, d, i, k constants. 

Similarly, 

(1.17) CD = mx + ny, m, n constants. 

(“And thus you see that, ... the length of any such line . . . can always be expressed 
by three terms, one of which consists of the unknown quantity y multiplied or 
divided by some known quantity; another consisting of the unknown quantity x 
multiplied or divided by some other known quantity; and the third consisting of a 
known quantity. An exception must be made in the case where the given lines are 
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FIGURE 1.5b. Problem by Pappus FIGURE 1.5c. Equation of a straight line 


parallel . . .” Descartes 1637, p. 312, transi. D.E. Smith and M.L. Latham 1925). 
Thus the condition (1.15) becomes 

y ■ ( mx + ny) = (d + ix + ky) 2 , 
which is an equation of the form 

(1.18) Ax 2 + Bxy + Cy 2 + Dx + Ey + F = 0. 

For each arbitrary y, (1.18) becomes a quadratic equation that is solved by alge- 
bra (see (1.10)). Coordinate transformations show that (1.18) always represents a 
conic. 


Fig. 1.5a is reproduced with permission of Bibi. Publ. Univ. Genéve. 
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Polynomial Functions 

Algebra not only helps geometry, but geometry also helps algebra, because the 
cartesian coordinates show algebra in a new light. In faet, if instead of (1.1) and 
(1.4) we consider 

(1.19) y = x 2 + 10x - 39, y = x 3 +6x-20 

and if we attribute arbitrary values to x, then for each x we can compute a value 
for y and can study the curves obtained in this way (Fig. 1.6). The roots of (1.1) or 
( 1 .4) appear as the points of intersection of these curves with the x-axis (horizontal 
axis). For example, we discover that the solution of (1.4) is simply x = 2 (a bit 
nicer than Eq. (1.7)). 




(1.1) Definition. A polynomial is an expression of the form 
y = a n x n + a n - ix"^ 1 + . . . + ao, 

where ao, ai ..... a n are arbitrary constants. If a n f 0, the polynomial is of 
degree n. 


Interpolation Problem. Given n + 1 points x,, y t (see Fig. 1.7), we look for a 
polynomial of degree n passing through all these points. We are mainly interested 
in the situation where the Xi are equidistant, and in particular where 

Xo = 0, X\ = 1, X2 = 2, , 7*3 =3, . . . . 

The solution of this problem, which was very useful in the computation of loga- 
rithms and maritime navigation, emerged in the early 17th century from the work 
of Briggs and Sir Thomas Harriot (see Goldstine 1977, p. 23f). Newton (1676) 
attacked the problem in the spirit of Viéte’s “algebra nova” (see Fig. 1.8): write 
letters for the unknown coefficients of our polynomial, e.g., 

y = A + Bx + Cx 2 + Dx 3 . 


( 1 . 20 ) 
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Ablcilfe 

A-\-p 

A+q 

A+r 

A-\-s 

A+t 

Ordinat* 

A + bp + ep* + ty* + tp * — «■ 

J+bq-tcq’ + aq> + eq* = fi 

A -i- br + cr* + Jr> er* = y 

A+bs+cs'+l i« + «4=/ 

A+bt + ct'+Jt*+ et*=t 

Divilbi. Diff. Ord 

Qitoti per divifionem prodeuntes. 

v — ?) « 

£’ + c*P+9 + <?xpP+P9 + 99+«xp , + P , 9 + P9 , + 9 ! = ? 

q—r) t—y 

J + cXg + r + dXgg + gr+rr + eyg«-Fg>r + gr> + r>= » 

T — i) y—t 

é+ cxr+ i + dxrr + r»-p « + exr* + r’i + ri 1 -p s> = 1 

*— 0 f— • 

3 + cxi ■+* f ■+■ dx ss -fc- it ■+- ft * *** *** tf* -f- f * = k 

p-r) 

e + dx>+g + r + expp + pg + gg + pr+gr + rr = .\ 

g— i) » — 8 

c + dxg-F r + 1 . + *xgg+ gr +rr +gj 4- rs -f u =(* 

r 0 « — 

c 4- dxr + J + f + eXrr+ r» + «+ rf + rt 4- tt = r 

P — 0 *— P 

d+ <xp + 9+ r + r =J. 

9—0 — » 

i 4- exg + r + r + t = n- 

s— 

e=t. 


FIGURE 1.8. Problem of interpolation by Newton (1676, Methodus Dijferentialis) 1 

The values yo,yi,U2, 2/3 having been given, we transform the “problem” into “al- 
gebraic equations” 


Abscissæ 

Ordinatæ 

x = 0 

A 

= yo 

x = 1 

A + B + C + .D 

= yi 

x = 2 

A + 2B + 4C 1 + 8D 

= 2/2 

x = 3 

A + 3B + 9C + 27 D 

= 2/3 


Reproduced with permission of Bibi. Publ. Univ. Genéve. 
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Here, we notice that the value A disappears if we subtract the equations, the lst 
from the 2nd, the 2nd from the 3rd, the 3rd from the 4th: 

B + C + D = t/i — j/o =: Ay 0 
(1.21) B + 3C + 7D = y 2 -y 1 =:Ay 1 

B + 5C+ 19 D = t/ 3 - j/2 =: Ay 2 . 

B disappears if we subtract once again: 

2C + 6D = Ayi - Ay 0 =: A 2 y 0 
2C + 12D = Ay 2 - A Vl =: A 2 y u 

and then so does C: 

(1.23) 6 D = A 2 Vl - A 2 yo =: A 3 y 0 . 

This gives us D. Then the first equation of (1.22) yields C, the first of (1.21) the 
value B. We arrive at the solution 

(1.24) y = y 0 + Ay 0 ■ x + ■ ( x 2 —x) + ^-p- ■ ( x 3 - 3x 2 + 2x). 

which can also be written as 

(1.240 y = y 0 + jAy 0 + 1 2 A 2 y 0 + 1 . 2 .3 Ay °' 

We will see in the next paragraph, using Pascal’s triangle, that this is a particular 
case of a general formula for polynomials of any degree. 


(1.2) Theorem. The polynomial of degree n taking the valnes 

yo (for x = 0), yi (for x = 1), . . . , y n (for x = n) 
is given by the formula 


(1.3) Remark. Since Newton (see Fig. 1.9), it is usual to arrange the differences in 
the scheme 


yo . where 

A Vo . o 

3/1 . A yo . 3 A Vi = Vi + 1 - Vi 

(1.25) y 2 2/1 A 2 yi V ° A 4 y 0 A 2 yi = Ay i+1 - Ay t 

y 3 2/2 A 2 y 2 Vl A 3 yi = A 2 y i+1 - A 2 yi , 

Am 
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Sic pergendum eft ad ulcimam differentiam. >o 

FIGURE 1.9. Newton’s scheme of differences (Newton 1676, Methodus Differentialisf 
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and we obtain the formula 

.. _ , ^ æ(ar — i) , 10 x(x — l)(x — 2) , c x(x - l)(ar - 2){x - 3) 

y - x +( ^ + ^ 6 24 

x 4 ^3 ^2 

= T + T + T' 

Similarly, we obtain 


(1.26) 


1 + 2 + ... + n 

1 2 + 2 2 + . . . + n 2 

1 3 + 2 3 + . . . + n 3 

1 4 + 2 4 + . . . + n 4 

1 5 + 2 5 + . . . + n 5 



Jacob Bernoulli (1705) found the general formula 


- H — — — h ^■An q ~ 1 + 


~ l)(g ~ 2) E 


g(g-l)(g-2)(g-3)(g- 


2 • 3 • 4 • 5 • 6 


where 
(1.27) A 


691 
2730’ ' 


are the so-called Bernoulli numbers. For an elegant explanation see Sect. II. 10 
below. 


Exercises 

1 . 1 The following problem, in Viéte’s notation, 

x + y + z = 20 
x : y = y : z 
xy = 8 

was proposed the 15th of December 1536 by Zuanne de Tonini da Coi (Colla) 
to Tartaglia, who could not solve it (see Notari 1924). Eliminate the variables 
x and z and understand why. Cardano later handed the problem over to Fer- 
rari who found the solution (see next Exercise). It is not astonishing that later 
Ferrari and Tartaglia exchanged ugly letters with heated disputes on mathe- 
matical questions. 
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1 .2 Reconstruct Ferrari’s solution of the biquadratic equation 

(1.28) x 4 + ax 2 = bx + c. 

Hint. a) Add o 2 /4 on both sides to obtain 

/ 2 a \2 , a 2 

[x + -) = bx + c+ 

b) Take y as a parameter and add y 2 +ay + 2 x 2 y on both sides to obtain 

(a; 2 + ^ + y) 2 = 2 x 2 y + bx + y 2 + ay + e+ ^ . 

c) The expression to the right, when written as Ax 2 + Bx + C, is of the form 
(ax + /3) 2 if B 2 = 4 AC. This leads to a third order equation for y. 

d) Having found a y satisfying this with Cardano’s formula (1.14), you obtain 

(z 2 + | +y) = ±(ax + P) 

with two roots each. 

Remark. Every polynomial zfiA^az 3 + bz 2 + cz + d = 0 can be reduced to 
the form (1.28) by the transformation x = z + a/4. 

1.3 (Euler 1749, Opera Omnia, vol. VI, p. 78-147). Solve the equation of de- 
gree 4 

x 4 + Bx 2 + Cx + D = 0 
by comparing the coefficients in 

x 4 + Bx 2 + Cx + D= (x 2 +ux + a)(x 2 - ux + f3) 

and finding an equation of degree 3 for u 2 . Solve this equation and compute 
the solutions of two quadratic equations. 

1.4 (L. Euler 1770, Vollst. Anleitung zur Algebra, St. Petersburg, Opera Omnia, 
vol. I). Consider an equation of degree 4 with symmetric coefficients, e.g., 

(1.29) x 4 + 5x 3 + Sx 2 + bx + 1 = 0. 

Decompose the polynomial as (x 2 + rx + l ')(x 2 + sx + 1) and find the four 
solutions of (1.29). 

Remark. Another possibility for the solution of (1.29) is to divide the equa- 
tion by x 2 and to use the new variable u = x + x~ 4 . 

1 .5 Problem proposed by Armenia/Australia for the 35th international mathemat- 
ical olympiad (held in Hong Kong, July 12-19, 1994). ABC is an isosceles 
triangle with AB = AC. Suppose that (i) M is the midpoint of BC and O is 




Summx Poteftatum . 

fit X inn +f «. 

/»» 00 f» 3 »» + 

A 3 x i« 4 + i»». 

A 4 X }»' +1» + + f» 3 5 »C_ 5*5*. 

fn' X j « 6 +i » 5 + T I » 4 sk— !£»». 

(n 6 X *» 7 +|» 6 + U' é » } * + 4 ^. 

A 7 X ^» 8 +i® 7 + t 7 i» 6 sk + t 1 ! 01 ®* 

/» 8 X i » 9 +i » 8 + f » 7 sjc— j 7 jn' sk +|» 3 sk— T 5». 
fi9 x T 1 o Blo +i” 9 + i® 8 sk— / 5 » 6 sk + i« 4 sk—T'z®«* 
/» I0 X T , T »"+i» ,0 + sk— I » 7 sk+ I® 5 >k— sk +£n. 
Qpin imi> qui legem progreiTionis inibi attentius infpexerit , eundem 
etiam continuare poterit ablq; his ratiociniorum ambagibus : Sumtå 
enim c pro poteftatis cujuslibet exponente, htlumma omnium » c léu 


/»« x + > c + iA» c I + C ' 2 C ^r r Bn ‘ 3+ 



Jac. Bernoulli, Ars conj. 1705 9 


These figures are reproduced with permission of Bibi. Publ. Univ. Gt 
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1.2 Exponentials and the Binomial Theorem 

Here it will be proper to observe, that I make use of x~ l , x~ 2 , x ~ 3 , x~ 4 , 
&c. for i, Jj, &c. of I2,x5,i2,x5 ,a;3 , &c. for y/x, Vx Vsc 3 , 

\fx, Vx 2 , & and of x~K x~i , x~i &c. for ^=, -^=, &c. And 

this by rule of Analogy, as may be apprehended from such Geometrical 
Progressions as these; x 3 , x? , x 2 , x? , x, x? , x°, (or 1;) x~z, a: -1 , x~^ , 
x~ 2 , &c. (Newton 1671, Fluxiones, Engl. pub. 1736, p. 3) 

For a given number a, we write 

This notation emerged slowly, mainly through the work of Bombelli in 1572, Si- 
mon Stevin in 1585, Descartes, and Newton (see quotation). If we multiply, e.g., 

a 2 ■ a 3 = (o • o) • (o • a- a) = a- a- a- a- a = a 5 , 


we see the rule 

( 2 . 2 ) 


In the geometric progression (2.1), every term is equal to its predessessor multi- 
plied by a. We can also continue this sequence to the left by dividing the terms by 
a. This leads to 

a~ 2 = a -i = — a° = l a 1 =a a 2 = a ■ a 


where we have used the notation 
(2.3) 


In this way, formula (2.2) remains valid also for negative exponents. Next, mul- 
tiplying 1 repeatedly by i Ja (where a has to be a positive number), we obtain a 
geometric progression 

1 , V^-V^=a, s/E-y/Z-yfåf*lfå, = a 2 , 


which suggests the notation 

(2.4) a m ' n = tyå*. 


Now formula (2.2) remains valid for rational exponents. We take only the positive 
roots, so that a 5 / 2 lies between a 2 and a 3 . The last step (for mankind) is irrational 
exponents, which are, as Euler says, “more difficult to understand”. But “Sic 
erit valor determinatus intra limites a 2 et a 3 comprehensus”, tells us that a'^ is 
a value between a 2 and a 3 , between a 26 / 10 and a 27 / 10 , between 0 264 / 100 and 
a 265/100 ; between o 2645/1000 and a 2646/1000 ; and SQ Qn 
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Binomial Theorem 

Although this proposition has an infinite number of cases, I shall give quite 
a short proof of it, by assuming 2 lemmas. 

The lst, which is self-evident, is that this proportion occurs in the second 
base; for it is quite obvious that ifi is to o as I is to I . 

The 2nd is that if this proportion occurs in some base, it will necessarily be 
true in the next base. 

(Pascal 1654, one of the first proof s by induction) 
We wish to expand the expression (a + b) n . Multiplying each result in turn by 
(■ a + b ) we obtain, successively, 


(a + 6)° = 1 
(u + 6) 4 =a + 6 

(2.5) (a + 6) 2 = o 2 + 2ab + b 2 

( o + 6) 3 = o 3 + 3a 2 6 + 3o6 2 + 6 3 

(a + 6) 4 = o 4 + 4a 3 6 + 6o 2 6 2 + 4o6 3 + 6 4 , 


and so on. There appears an interesting triangle of “binomial coefficients” (Omar 
Alkhaijåmå in 1080, Tshu shi Kih in 1303, M. Stifel 1544, Cardano 1545, Pascal 
1654, see Fig. 2.1) 


(2.6) 


1 1 

1 2 1 
13 3 1 

1 4 6 4 1 

15 10 10 5 1 

1 6 15 20 15 6 1 

1 7 21 35 35 21 7 1 


in which each number is the sum of its two “superiors”. We want to find a general 
law for these coefficients. It is not difficult to see that the first diagonal in this 
triangle is composed of “1” and the second (1,2,3, .. .) of “n”. For the third 
diagonal (1, 3, 6, 10, . . .) we guess “ n ”, followed by “ ra ^ TO ~ 1 ^f”~ 2 ^ ”, and 
so on. This suggests the following theorem. 


(2.1) Theorem (Pascal 1654). For n = 0, 1, 2, . . . we have 

(-+»>’ = °" + T °-> + ^ <■-»* + " (n I ff",- 2> + 


This sum is finite and stops after n + 1 terms. 

Proof. We compute the ratio of each number in (2.6) with its left-hand neighbor 
(Pascal 1654, p. 7, “Consequence douziesme”). 
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FIGURE2.1. Original publication of Pascal’s triangle, Pascal (1654) 1 


(2.7) 


Here, it is not difficult to guess a general law. We prove this law “by induction on 
the row-number” (see quotation). Suppose that 

ABC 

(2.8) D = A + B, E = B + C 

D E 

is a part of Pascal’s triangle with the “induction hypothesis” 

B k C k - 1 




A 

i- 1’ 

B l ' 


Then, 






(2.9) 

E 

_ B + C 

1+ c 

1 ' B 

1 + fc_i |+fc_i 

k 

D 

~ A + B 

1 1 I 
B ' 1 

= ¥^ = ¥ 

= T 


1 Fig. 2. 1 is reproduced with permission of Bibi. Publ. Univ. Genéve. 
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which means that the same structure is also found in the next line. 

The faet that the ratios in the nth row of (2.7) are given by n/l, (n — l)/2, 
(n — 2)/3, . . . implies that the coefficients of (2.6) are a product of such ratios; 
e.g., the “20” in the 7th line is the product 


20 15 6 (2/7) 4 5 6 _ 6-5-4 

~ 15 6 1 ~ 3 2 1 3 • 2 • 1 ’ 

and we see that Theorem 2. 1 is true in general. 

These coefficients 

n(n — 1) . . . (n — j + 1) _ n(n - 1) . . . (n - j + 1) (n - j) . . . 1 

( 2 . 10 ) 


1 • 2 • . . . 


1 - 2 - 


-i-1-2- 


• (n-j) 


are called binomial coefficients and n! = 1 • 2 


.-G) 


ri is th efactorial of n. 


Application to the Interpolation Polynomial. Expand the expressions in the dif- 
ference scheme (1.25): 


2/2 - 2 + t/o 

2/2 - 2/1 2/3 - 32/2 + 32/i - 2/0 • 

2/3 - 22/2 + 2/1 


The appearance of Pascal’s triangle is not a coincidence, because each term is the 
difference of the two terms to its left. 

Furthermore, each term of the scheme (1.25) is the sum of the term above it 
with the term to its right. Consequently, the scheme can also be written as 


2/o + Ay 0 

2 /o + 2Ay 0 + A 2 y 0 


Ay 0 

Ay 0 + A 2 y 0 
Ay 0 + 2A 2 y 0 + A 3 y 0 


A 2 y 0 
A 2 y 0 + A 3 y 0 


A 3 y 0 - 


2/o + 3Z\2 /o + 3A 2 2/o + A 3 yo 
Pascal’s triangle appears again. Formula (2.10) thus yields 


jAy 0 • 


n(n - 1 ) , 


n(n-l)(n-2) 


A 3 y 0 -\ 


and this proves Theorem 1.2. 
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Negative Exponents. We begin with 


(a + b)- 1 


1 

a+T' 


If we assume that \b\ < |o|, a first approximation to this ratio is l/a. We try to 
improve this value by an unknown quantity S, 

— = - +6 => 1 = 1 + - + a 5 + bS. 

a + b a a 

Since |6| < |o|, we neglect the term b5 and obtain S = —b/a 2 . Repeating this 
process again and again (or, more precisely, proceeding by induction), we arrive 
at 


( 2 . 11 ) 


. ,, , 1 b b 2 b 3 

(a + 6) — j H — ^ j 1 


which is the same as Theorem 2.1 for n = —1. This time, however, the series is 
infinite. 

If we multiply (2.1 1) by a and put x = b/a, we obtain 


( 2 . 12 ) 


1 2 3 4 5 

= 1 — x + x —x +x —x + ... 

1 + x 


x\ < 1, 


the famous geometrical series (Viéte 1593). 

Square Roots. Next, we consider (a+b) 1 / 2 = V a + b. We again suppose b small, 
so that \/a + b æ ■ v /o, and search for a S such that 


a + b= sfa + 5 


is a better approximation. Then, 

a + b= ( \fa + (5) 2 = a + 2^/aS + 5 2 . 

As 5 is small, we neglect S 2 and have S = b/ (2 i/a). Consequently, 


(2.13) 



Example. Computation of \/2. We start from an approximate value v = 1.4 and 
set a = v 2 , b = 2 — a = 2 — v 2 . Then, (2.13) gives as a new approximation 


- = - +V : 


formula that can be applied repeatedly and yields 
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1.4 

1.414285 

1.4142135642 

1.4142135623730950499 

1.4142135623730950488016887242096980790 

1.41421356237309504880168872420969807856967187537694807317667973799 . 
The same calculation performed in base 60 starting with 1, 25 gives 1, 24, 51, 10 
(commas separate digits in base 60), a value found on a Babylonian table dating 
from 1900 B.C. (see Fig. 2.2, see also van der Waerden 1954, Chap. II, Plate 8b). 
This indicates that formula (2.13) has been in use since Babylonian and Greek 
antiquity. 



FIGURE2.2. Babylonian cuneiform tablet YBC 7289 from 1900 B.C. representing a 
square of side 30, with diagonal given as 42, 25, 35 and ratio 1, 24, 51, 10 2 


Next Step (Alkalsådi around 1450, Briggs 1624). To improve (2.13), consider 


= V “ + 2^ + 6 ' 


compute the square 


F 6 + -j- + 2^/a S + --p + S 2 , 
4 a ya 


neglect the last two terms, and obtain 
(2.14) 


b b 2 

Va+ VS~87P‘ 


Example. For \/2, we obtain this time as new approximation 

2 - v 2 4 - Av 2 + v 4 _ 3v 3 1 

V+ ~~2v 8u3 ~ ~8 + 2v~ 2u3’ 


Reproduced with permission of Yale Babylonian Colleetion. 
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the repeated use of which, starting with v = 1.4, gives rapid convergence: 


1.4 

1.4142128 

1.41421356237309504870 

1.41421356237309504880168872420969807856967187537694807317643 . 


Equations (2.13) and (2.14) become noticeably neater if we divide them by 
sfa and if b/a is replaced by x: 


(l+x)= «1 + |, (!■+#)* 


x 

2 


In order to obtain a more precise approximation, we can continue the above cal- 
culations. The result will be a series of the type 


(1 + x) 2 =1 + ^- + bx 2 + cx 3 + dx 4 + . . . , 


whose coefficients b,c,d,... we want to determine. Inserting this series into the 
relation (1 + x) 2 (1 + x) 2 = 1 + x and comparing equal powers of x yields 
b = —1/8, c = 1/16, d = —5/128, .... Consequently, we have the better ap- 
proximation (Newton 1665) 


(2.15) 


( 1 +*)’ = l + \ x 


1 2 1 3 5 4 

8 X + 16 X 128 X + 


We note that 


1 _ 1 • 1 _ !) 1 _ 1-1-3 _ 2(2-l)(2-2) 

8 2-4 2 ’ 16 2-4-6 1-2-3 

_ _5_ _ 1 ' 1 ' 3 • 5 = 2(l-l)(l-2)(2-3) 

128 2 -4- 6- 8 1 • 2-3-4 

which leads to the conjecture that Theorem 2.1 is also true for n =1/2. The 
sequence 1 + x/2, 1 + x/2 — x 2 /8, . . . sketched in Fig. 2.3, illustrates the con- 
vergence of (2.15) toward y/l + x for —1 < x < 1. 


Arbitrary Rational Exponents. 

All this was in the two plague years of 1665 and 1666, for in those days 
I was in the prime of my age for invention, and minded mathematics 
and philosophy more than at any other time since. 

(Newton, quoted from Kline 1972, p. 357) 

One of Newton’s ideas of these “anni mirabiles”, inspired by the work of Wallis 
(see the remark following Eq. (5.27)), was to try to interpolate the polynomials 
(1 + a;) 0 , (1 + x) 1 , (1 + x ) 2 , ... in order to obtain a series for (1 + x) a where a is 
some rational number. This means that we must interpolate the coefficients given 
in Theorem 2.1 (see Fig. 2.4). Since the latter are polynomials in n, it is clear 
that the result is given by the same expression with n replaced by a. We therefore 
arrive at the general theorem. 
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FIGURE2.3. Series for (1 + «|t = 1 + \x - ±x 2 + ±x 3 - 


(2.2) Theorem (Generalized binomial theorem of Newton). For any rational a we 
havefor \x\ < 1 



FTGURE2.4. Interpolation of Pascal’s triangle, Newton’s autograph (1665) 3 

Even Newton found that his interpolation argument was dangerous. Euler, 
in his Introductio (1748, §71), stated the general theorem (“ex hoc theoremate 
universali”) without any further proof or comment. Only Abel, a century later, felt 
the need for a rigorous proof (see Sect. III. 7 below). 

3 Fig. 2.4 is reproduced with permission of Cambridge University Press. 
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Remark. This is precisely the formula that was engraved on Newton’s gravestone 
in 1727 at Westminster Abbey. Don’t make useless efforts ... for the past hundred 
years the formula has been illegible. 


Exponential Function 


. . . ubi e denotat numerum, cuius logarithmus hyperbolicus est 1. 

(first definition of e; Euler 1736b, Mechanica, p. 60) 

Origins. 1. F. Debeaune (1601-1652) was the first reader of Descartes’ “Géomé- 
trie” of 1637. A year later, he posed Descartes the following geometrical problem: 
find a curve y(x) such that for each point P the distances between V and T, the 
points where the vertical and the tangent line cut the x-axis, are always equal to 
a given constant a (see Fig. 2.5a). Despite the efforts of Descartes and Fermat, 
this problem remained unsolved for nearly 50 years. Leibniz (1684, “. . . tentavit, 
sed non solvit”) then proposed the following solution (see Fig. 2.5b): Let x, y be 
a given point. Then, increase x by a small increment b, so that y increases (due 
to the similarity of two triangles) by yb/a. Repeating, we obtain a sequence of 


for the abscissae x, x + b, x + 2b, x + 3b, 



2. Questions like “If the population in a certain region increases annually by 
one thirtieth and at one time there were 100,000 inhabitants, what would be the 
population after 100 years?” (Euler 1748, Introductio §110) or “A certain man 
borrowed 400.000 florins at the usurious rate of five percent annual interest . . .” 
( Introductio § 111 ) lead to the computation of expressions such as 

/ l\ioo / \ N / \ N 

(2.16) C 1 + 30 ) - (1+0.05J , or in general (l + wj , 

where u is small and N is large. 
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Euler’s Number. Suppose first that w = jr. We compute (2.16) with the help of 
Theorem 2.1, 

/ l ) N -i N N ( N ~ 1 N(N ~ 1)(iV “ 2) 1 

{ Nj ~ + N + Ul N^ + 1-2-3 iW + "' 

1(1 - i) 1(1 - i)(l - £) 

++ 1-2 + 1-2-3 + '"' 

Here, Euler States without wincing that “if N is a number larger than any 
assignable number, then is equal to 1”. This shows that as N tends to in- 
finity, (1 + j?) N tends to the so-called Euler number 


1-2 1-2-3 1-2-3-4 


We emphasize that this argument is dangerous, because it is applied infinitely 
often. For example, by a similar “proof” we would obtain 


1 1 _ 1 1 1 _ 1 1 
2 + 2 _ 3 + 3 + 3 ~N + N 


i- = 0 + 0 + 0 + ... =0. 

N 


We shall return to this question in Sect. III. 2. Table 2. 1 compares the convergence 
of the series with that of ( 1 + -jj) N . 


TAB LE 2.1. Computation of e 


N 


O + 


1 2.000 2.0 

2 2.250 2.5 

3 2.370 2.66 

4 2.441 2.708 

5 2.488 2.7166 

6 2.522 2.71805 

7 2.546 2.718253 

8 2.566 2.7182787 

9 2.581 2.71828152 

10 2.594 2.718281801 

11 2.604 2.7182818261 

12 2.613 2.71828182828 

13 2.621 2.718281828446 

14 2.627 2.7182818284582 

15 2.633 2.71828182845899 

16 2.638 2.7182818284590422 

17 2.642 2.71828182845904507 

18 2.646 2.718281828459045226 

19 2.650 2.7182818284590452349 

20 2.653 2.718281828459045235339 

21 2.656 2.7182818284590452353593 

22 2.659 2.718281828459045235360247 

23 2.661 2.7182818284590452353602857 

24 2.664 2.718281828459045235360287404 

25 2.666 2.7182818284590452353602874687 

26 2.668 2.71828182845904523536028747125 

27 2.670 2.718281828459045235360287471349 

28 2.671 2.71828182845904523536028747135254 
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FIGURE2.6a. (l + %) N 



FIGURE2.6b. 1 + x + £ + + . . . 


Powers of e. We next set to = x/N in (2.16), where x is a fixed, say rational 
number. That is to say that we simultaneously let N tend to infinity and u> to zero 
in such a manner that their product remains equal to the constant x. Exactly the 
same manipulation as above now leads to the result 

(2 18) ( 1 + f ) ^ 1+x+ r2 + rhs + 1-2.3.4 + - ■ 

On the other hånd, we set M = N/x, N = xM for those values of N such that 
M is an integer. This gives, for N and M tending to infinity, 



On combining (2.18) and (2.19), we have the following theorem. 


(2.3) Theorem (Euler 1748, Introductio §123, 125). For N tending to infinity, 


( 1 + ^) 


= 1 + 


□ 


The convergence of these expressions to e x (also denoted by exp x) is illus- 
trated in Figs. 2.6a and 2.6b. The dotted line represents the exact function e x . 
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Exercises 


2.1 Verify the following formula (Euler 1755, Opera vol. X, p. 280) by using 
50 = 2 • 5 2 = 7 2 + 1: 


“quae ad computum in fractionibus decimalibus instituendum est optissima”. 
Add numerically five terms of this series. 

Hint. Work with the series for (1 — a;) -1 / 2 . 

2.2 Show that the number, written in base 60 as 1, 25, is a good approximation 
to \/2. Show that one iteration of the “babylonian square root algorithm” 
deduced from formula (2.13) leads to 1, 24, 51, 10, ... , the value of Fig. 2.2. 

2.3 By multiplying the series 

(1 + x ) 1/3 = 1 + ax + bx 2 +cx 3 + ... 


with itself twice, determine the coefficients o, 6 , c, . . . to find 
(1 + X)‘/3 = 1 + !_A_ I S + A_5_ I 3_ + .. 
By using 2 • 4 3 — 5 3 = 3, obtain the formula 


xf2 = -(l + 


1 2 
1 • 125 ~~ 1-2 - (125) 2 

2-5 2-5-8 


1 • 2 • 3 • (125) 3 1- 2-3-4- (125) 4 


Remark. The determination of s/2 was one of the great problems of Greek 
mathematics (double the volume of the cube). 

2.4 (Bernoulli’s inequality; Jac. Bemoulli 1689, see 1744, Opera, p. 380; Barrow 
1670, see 1860, Works, Lectio VII, §XIII, p. 224). By induction on n, prove 
that 


(1 + a) n > 1 + no for a>- 1 , n = 0 , 1 . 2 . . . . 

1— na<(l — a) n < — - — for 0 < a < 1, n = 2,3, 

v ' 1 + no 


2.5 In order to study the convergence of (l + ^)" to e, consider the sequences 
a n = (l + and b n = (l + ^j 

Show that 


ai < <22 < <23 < . . - < e < . . . < 63 < 62 < b\ 
and that b n — a n < 4/n. 

Hint. Use the second inequality of Exercise 2.4 with a = l/n 2 . 
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Tabularum autem logarithmicarum amplissimus est usus . . . 

(Euler 1748, Introductio, §110) 
Students usually find the concept of logarithms very difficult to understand. 

(B.L. van der Waerden 1957, p. 1) 

M. Stifel (1544) highlights the two series (see facsimile in Fig. 3.1) 


8 

256 


AriTHMBTICAB LiBBR. hi, JJ7 

“ IgiJfdlione. ut piene oltendi lib.i . caplte de geomet.ptogref. 
Vide ergo, 

0. I. ». }. 4- T- 6. 7. 8. 

1. z. 4. 8. 1 6, jz. 6 4, 118. zy$. 

Sicut ex additione(in fuperiore ordine) 3 ad y flunt 8 ,fic(in in- 
ferioreordine)exmultipIicatione8in3z fiunt zj-fi.Eftautem 
3 exponens ipfius odonarij , & y eft exponens mimen 3 z . & 8 
eftexponensmimerizyé, Item deur in ordine iuperiori.ex 
fubtnc'ttone 3 de 7>remanent 4><ta in inferiori ordine ex diui- 
Done n8 per 8,fiunt 16. 

Sed o ftendenda eft ifta fpecufario per cxemp lum, . 

1-3 M— »1 o I I | »I 3 I 4l ri 6 ] 

1 41 jl ål »1 »I 4) 8|i*|3»|«4f 


FIGURE3.1. Extracts from Stifel’sbook (p. 237 and 250) 1 


We see that passing from the lower to the upper line transforms products into sums. 
For example, instead of multiplying 8 by 32 “in inferiore ordine”, we take the cor- 
responding “logarithms” 3 and 5 “in superiore ordine”, compute their sum which 
is 8, return from there “in inferiore ordine”, and find the product 8 • 32 = 256. 
A more detailed table of this type would be of great use since additions are eas- 
ier than multiplications. Such “logarithmic” tables (Å070? is Greek for “word, 
relation”, agtØpot; means “number”, logarithms are therefore useful relations be- 
tween numbers) were first computed by John Napier (1614, 1619), Henry Briggs 
(1624), and Jost Burgi (1620). 

(3.1) Definition. A function £(x), defined for positive values of x, is called a 
logarithmic function iffor all x,y > 0 


i(x ■ y) = £(x) +£{y). 


Reproduced with permission of Bibi. Publ. Univ. Genéve. 
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If we set first y = z/x and then x = y = 1 in (3.1), we obtain 

(3.2) £{z/x)=£{z)-£{x), 

(3.3) £(1) = 0. 

Applying (3.1) twice to x ■ y ■ z = (x ■ y) ■ z gives 

(3.4) £(x-yz)=£(x)+£(y)+£(z), 

and similarly for products with four or more terms. Next, applying (3.4) to \fx ■ 
sfx ■ tyx = x, we obtain £{ tyx) = %£(x), or in general 

(3.5) £(x^) = ^ £(x), where x% = \/x™. 


Bases. Let a fixed logarithmic function £{x) be given and suppose that there exists 
a number a for which £(a) = 1. Then, (3.5) becomes 

(3.6) £(at) = 

n 

i.e., the logarithmic function is the inverse function for the exponential function 
a x . We call this the logarithm to the base a and write 

(3.7) y = log a x if x = a v . 

Logarithms to the base 10 ( Briggs ’ logarithms ) are the most suitable for nu- 
merical computations, since a shift of the decimal point just adds an integer to 
the logarithm. The hest base for theoretical work, as we soon shall see, is Euler’s 
number e ( natural or Naperian or hyperbolic logarithms). These logarithms are 
usually denoted by ln x or log x. 

Euler’s “Golden Rule”. If the logarithms for one base are known, the logarithms 
for all other bases are obtained by a simple division. To see this, take the logarithm 
to the base b of x = a v and use (3.7) and (3.5). This yields 

(3.8) log b x = y ■ log b o => y = log x = 

l°g b a 


Computation of Logarithms 

By computing the square root of the base a, then the square root of the square 
root, and so on, and by multiplying all these values, we obtain, with the help of 
(3.6) and (3.1), the logarithms of many numbers. This is illustrated for a = 10 in 
Fig. 3.2. 
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There remains a problem: we would prefer to know the logarithms of such num- 
bers as 2, 3, 4, . . . and not of 4.2170 or 2.3714. 

Briggs’ Method. Compute the root of 10, then the root of the root, and continue 
doing so 54 times (see facsimile in Fig. 3.3). This gives, with c = 1/2 54 , 

(3.9a) 10° = 1.00000 00000 00000 12781 91493 20032 35 = 1 + a. 

Then, compute in the same way the successive roots of 2: 

(3.9b) 2 C = 1.00000 00000 00000 03847 73979 65583 10 = 1 +b. 


The value x = log 10 2 we are searching for satisfies 2 = 10 æ . Hence, 


l + b (3 ^ b) 


T 


(iot 


(3.9a) (Theorem 2.2) 

= (1 + a) « 1 + ax 


and we obtain 

(3.10) log 10 (2) = x « h - 


3847739796558310 

12781914932003235 


« 0.3010299956638812. 


This gives us one value. The amount of work necessary for the whole table is 
hardly imaginable. 


Interpolation. Interpolation was an important tool for speeding up the compu- 
tation of logarithms in ancient times. Say, for example, that four values of log 10 
have been computed. We compute the difference scheme 


log(44) = 1.6434526765 

0.0097598373 

log(45) = 1.6532125138 -0.0002145194 

0.0095453179 0.0000092277 . 

log(46) = 1.6627578317 -0.0002052917 

0.0093400262 

log(47) = 1.6720978579 
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This gives the interpolation polynomial (Theorem 1.1, shifted) 


p{x) =1.6434526765 + (x - 44) (^0.0097598373 
+ £^(-0.0002145194+ . 0.« 


for which some selected values with errors are given in Table 3.1. The results are 
quite good despite the ease of computations. By adding additional points, one can 
increase the precision whenever this is desired. 


TAB LE 3.1. Errors of interpolation polynomial 


æ 

p{x) 

log 10 (a:) 

err 

44.25 

44.50 

44.75 

1.645913252 

1.648359987 

1.650793026 

1.645913275 

1.648360011 

1.650793040 

2.34 • 10“ s 

2.42 • 10" 8 

1.35 • 10 -8 

45.25 

45.50 

45.75 

1.655618594 

1.658011411 

1.660391109 

1.655618584 

1.658011397 

1.660391098 

-1.05 • 10“ 8 
— 1.43 • 10 -8 
-1.04- 10" 8 

46.25 

46.50 

46.75 

1.665111724 

1.667452930 

1.669781593 

1.665111737 

1.667452953 

1.669781615 

1.32- 10" 8 

2.34 • 10" 8 

2.24 • 10" 8 


Before going on with the calculus of logarithms, we make a little excursion into 
geometry. 


Computation of Areas 

The determination of areas and volumes exercised the curiosity of mathematicians 
since Greek antiquity. Two of the greatest achievements of Archimedes (283-212 
B.C.) were the computation of the area of the parabola and of the circle. The early 
17th century then saw the computation of areas under the curve y = x a with either 
integer or arbitrary values of a (Bonaventura Cavalieri, Roberval, Fermat). 

Problem. Given a, find the area below the curve y = x° between the bounds 
x = 0 and x = B. 

Solution (Fermat 1636). We choose 9 < 1 but close to 1 and consider the rect- 
angles formed by the geometric progression B, 9B, 9 2 B, 0 3 B , . . . (Fig. 3.4b), of 
height B a , 6 a B a , 0 2a B a , 9 ia B" , . . . . Then, the area can be approximated by the 
geometrical series 

lst Reet. + 2nd Reet. + 3rd Reet. + . . . 

= B( 1 - 6)B a + B{6 - 9 2 )9 a B a + B(9 2 - 9 3 )0 2a B a + ... 
(3 ' 12) = B a+1 ( i - e) (i + ø a+1 + e 2a+2 + ...) = B a+1 1 l J ^ +l , 


geometrical series 
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if a + 1 > 0 or, equivalently, a > — 1 (see Eq. (2.12)). Let 9 = 1 — e with e small. 
Then, 1 — 6 = e, 9 a+1 = 1 — (a + l)e + . . . by Theorem 2.2. Consequently, 

1-9 6 1 

“ (STTF = STT ,or 

The sum of the rectangles (3.12) approximates (for a > — 1) the area S from 
above. If we replace the heights of the rectangles by 0 a B a , 0 la B a , . . . we get an 
approximation of S from below. In this situation, the value (3.12) is just multiplied 
by 6 a , which, for 0 —> 1, tends to 1. Therefore, both approximations tend to the 
same value and we get the following result. 


(3.2) Theorem (Fermat 1636). The area below the curve y 
x = 0 and x = B is given by 


S 


B a+ 1 
a+1 


if 


a > -1. 


and bounded by 


□ 


Area of the Hyperbola and Natural Logarithms 

In the month of September 1668, Mercator published his Logarithmotech- 
nia, which contains an example of this method (i.e., of infinite series) in a 
single case, namely the quadrature of the hyperbola. 

(Letter of Collins, July 26, 1672) 

Fermat’s method does not apply to a hyperbola y = l fx. Tn faet, the geometric 
sequence of abscissae B, 9B, 9 2 B, 6 3 B, . . . becomes, for the areas, the sum 
(1 — 6)(1 + 1 + 1 + . . .), whose partial sums form an arithmetic progression. 
This motivates the following discovery (made by Gregory of St. Vincent in 1647 
and Alfons Anton de Sarasa in 1649; see Kline 1972, p. 354): the area below the 
hyperbola y = 1/xis a logarithm (see Fig. 3.5). 


3 Fermat’s portrait is reproduced with permission of Bibi. Math. Univ. Genéve. 
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We observe (by contracting the æ-coordinates and stretching the y-coordi- 
nates) that, e.g., Area (3 — > 6) = Area (1 — » 2). Therefore, 

Area (1 — > 3) + Area (1 — > 2) = Area (1 — > 6). 

This means that the function ln(o) = Area (1 — > a) satisfies the identity 
ln(a) + ln(6) = ln(a • b) 
and is therefore a logarithm (the “natural” logarithm). 



Mercator’s Series. After a shift of the origin by 1 we have that ln(l+a) is the area 
below 1/(1 + x) betweenO and a. We substitute 1/(1 + x) = l—x+x 2 —x 3 +. . . 
(formula (2.12)) and insert for the areas below 1, x, x 2 , . . . between 0 and a the 
expressions of Theorem 3.2: 

a 2 a 3 fl 4 

a ’ T’ W T’ 

(see Fig. 3.6). In this way, we find, after replacing a by x (N. Mercator 1668), 


(3.13) 



The convergence of this series for various values of x is shown in Fig. 3.7. With 
the value x = 1 this series becomes 
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Gregory’s Series. Replace x in (3.13) by —x: 

(3.14) = 

and then subtract this equation from (3.13). This gives (Gregory 1668) 



Examples. Putting x = 1/2 in (3.14) and x = 1/3 in (3.15) we obtain the follow- 
ing two series for ln 2: 


(3.14a) 


ln2 “ 2 + 2T2^ + 3T2^ + 4T2^ + --' 
ln2 = 2 (^ + 3 ^ + 5 ^ + 7 ^ + ---)- 


(3.15a) 
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TAB LE 3.2. Convergence of the series for ln 2 


n 

(3.13a) 

(3.14a) 

(3.15a) 

1 

1.000 

0.500 

0.667 

2 

0.500 

0.625 

0.6914 

3 

0.833 

0.667 

0.69300 

4 

0.583 

0.6823 

0.693135 

5 

0.783 

0.6885 

0.6931460 

6 

0.617 

0.6911 

0.69314707 

7 

0.760 

0.69226 

0.693147170 

8 

0.635 

0.69275 

0.6931471795 

9 

0.746 

0.69297 

0.693147180559 

10 

0.646 

0.693065 

0.6931471805498 

11 

0.737 

0.693109 

0.6931471805589 

12 

0.653 

0.693130 

0.69314718055984 


The performance of these three series (3.13a), (3.14a), (3.15a) for ln2 are com- 
pared in Table 3.2. It is obvious which one is hest. 


Computation of ln p for Primes > 3. Because of (3.1), it is sufficient to compute 
the logarithms of the prime numbers. The logarithms of composite integers and 
rational numbers are then obtained by addition and subtraction. The idea is to 
divide p by a number close to it for which the logarithm is already known. Then, 
we can apply series (3.15) with a small value of x and obtain rapid convergence. 
For example, for p = 3 we write 


l + x 
1 - x 


X — - 


that 


(3.16) 


ln 3 = ln - + ln 2 = ln - 


Another possibility is 3 = (3/4) • 4, which leads to 


ln 3 = 2 ln 2 — ln 


i±i 


Still better is the use of the geometric mean of the above expressions: 




In3 = -ln2 + -lniif 

2 2 1- i 


ln 5 = - ln 2 + - ln 3 + - ln - 


ln 7 = 2 ln 2 + - ln 3 + - ln - 



38 I. Introduction to Analysis of the Infinite 


and so on. The larger p is, the better the series (3.15) converges. The first values 
obtained in this way are 

ln(l) = 0.000000000000000000000000000000 
ln(2) = 0.693147180559945309417232121458 
ln(3) = 1.098612288668109691395245236923 
ln(4) = 1.386294361119890618834464242916 
ln(5) = 1.609437912434100374600759333226 
ln(6) = 1.791759469228055000812477358381 
ln(7) = 1.945910149055313305105352743443 
ln(8) = 2.079441541679835928251696364375 
ln(9) = 2.197224577336219382790490473845 
ln(10) = 2.302585092994045684017991454684. 

The improvement of this calculation (compared to that of Briggs), achieved in 
only a few decades (from 1620 to 1670), is obviously spectacular. It demonstrates 
once again the enormous progress made in mathematics after the appearance of 
Descartes’ Geometry. 

Connection with Euler’s Number. The connection between the natural logarithm 
and e is established in the following theorem. 

(3.3) Theorem. The natural logarithm ln x is the logarithm to base e. 

Proof. We apply the natural logarithm to the formula of Theorem 2.3. This gives, 
using (3.5) and (3.13), 

1 n ( 1 + -)' V = W . b ( 1 + -) =JV .(±-^ + ...)^x, 


so that ln e x = x. □ 

We thus obtain a geometric interpretation of e: it is the number for which the 
area under the hyperbola y = 1/x between 1 and e is equal to 1 (see Fig. 3.8). 





FIGURE 3.8. Geometric meaning of i 



1.3 Logarithms and Areas 39 



FIGURE 3.9a. The functions y = x a FIGURE 3.9b. The functions y = a x 


Arbitrary Powers. Logarithms allow us to compute (and define) abritrary powers 
as follows (Joh. Bernoulli 1697, Principia Calculi Exponentialium, Opera, vol. I, 
p. 179): we use a = e lna and get 


(3.19) 


= (e lno ) b = e blna . 


Graphs of these functions, considered either as a function of a or as a function of 
b, are sketched in Figs. 3.9a and 3.9b. 


Exercises 


3.1 (Newton 1671, Method of Fluxions, Euler 1748, Introductio, §123). Show 
that 2 = (4/3) • (3/2) yields 

in 2 = , 1*3 = ln (^4) + ln2 > 

which allows the simultaneous calculation of ln 2 and ln 3 by two rapidly 
convergent series (3.15). 

3.2 (Newton 1669, “Inventio Basis ex Area data”). Suppose that the area z under 
the hyperbola is given by the formula 

r _ _ _ 1 „2 , 1 „3 _ 1 „4 , 1 „5 


2 X "r 3 X 4*^ t 5 

Find a series for x = e z — 1 of the form 

x = z + a,2Z 2 + a^z 3 + a4Z 4 + . . . 
and (re)discover the series for the exponential function. 
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1.4 Trigonometric Functions 

Sybil: It goes back to the dawn of civilization. 

(J. Cleese & C. Booth 1979, Fawlty Towers, The Psychiatrists) 

Measuring Angles. One of the oldest interests in geometry is the measurement 
of angles, mainly for astronomical purposes. The Babylonians divided the circle 
into 360°, probably because this was the approximate number of days in the year. 
Half the circle would then be 180°, the right angle 90°, and the equilateral triangle 
has angles of 60° (see Fig. 4.1a). Ptolemy 4 , in his Almagest, A.D. 150, refined the 
measurements by including the next digits in the number system in base 60, then in 
vogue, partes minutae primae (first small subdivisions) and partes minutae secon- 
dae (second small subdivisions). These became our “minutes” and “seconds”. But 
360° is not the only possibility. Many other units can be used; e.g., in some tech- 
nical applications we have grades, where the right angle has 100 grades. However, 
as for logarithms, there is a natural measure, based on the arc length of a circle 
of radius 1, the radian (see Fig. 4.1b). Here, the arc length of half of the circle is, 
with the precision computed by Th. F. de Lagny in 1719 and reproduced by Euler 
(with an error in the 113th decimal place, which is corrected here), 

3.14159265358979323846264338327950288419716939937510 
5820974944592307816406286208998628034825342 1 1 70679 
821480865132823066470938446 .... 

For this somewhat unwieldy expression W. Jones (1706, p. 243) introduced the 
abbreviation tt (“periphery”). Then the angle of 54° drawn in Fig. 4.1 measures 
547r/180 = 0.9425 radians. 



FIGURE4.1a. Babylonian degrees 



Definition of Trigonometric Functions. How can one measure an angle with a 
rigid ruler? Well, we can only measure the chord (see Fig. 4.2), and then, with the 
help of tables, try to find the angle, or vice versa. Such tables have their origin 
in Greek antiquity (Hipparchus 150 B.C. (lost) and Ptolemy A.D. 150). The sine 
function, which is connected to the chord function by sin a = (l/2)chord (2a), 
has its origin in Indian (Brahmagupta around 630) and medieval European science 


4 = TTroXe/j.iyTcK,, Ptolemeus, Ptolemaus, Ptolemée, Tolomeo, nTOJieMeit, .... 
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(Regiomontanus 1464). This function, originally named sinus rectus (i.e., vertical 
sine), is much better adapted to the computation of triangles than the chord func- 
tion. 



(4.1) Definition. Consider a right-angled triangle disposed in a circle of radius 
1 as shown in Fig. 4.3. Then, the length of the leg opposite angle a is denoted by 
sina, that of the adjacent leg by cos a. Their quotients, which are the lengths of 
the vertical and horizontal tangents to the circle, are 

sin a , cos a 

tan a = and cot a = . 

cos a sin a 

These definitions apply immediately to an arbitrary right-angled triangle with 
hypotenuse c and other sides a, b (with a opposite angle a ): 

(4.1) a = c-sina, b = c- cosa, a = b- tana. 

While in geometry angles are traditionally denoted by lowercase Greek let- 
ters, as soon as we pass to radians and to the consideration of functions of a real 
variable (see the plots in Fig. 4.4), we prefer lowercase Latin letters (e.g., x) for 
the argument. Many formulas can be deduced from these figures, such as 


sinO = 0, 

cosO = 1, sin7r/2 = 1, 

cos7t/ 2 = 0, sin7r = 0 

(4.2a) 

sin(— x) = — sina;, 

cos(— a;) = cosa; 

(4.2b) 

sin(x + 7r) = — sina;, 

cos(a; + n) = — cosa; 

(4.2c) 

sin(a; + n/2) = cosa;, 

cos(a; + n/2) = — sin 

(4.2d) 

sin 2 x + cos 2 x = 

1. 


The functions sina; and cosa; are periodic with period 2w, tana; is periodic with 
period tt. 
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Fig. 4.5 reproduces a drawing of the sine curve on page 17 of A. Diirer’s 
Underweysung der Messung (1525). Durer calls this curve “eynn schraufen lini” 
and claims it is useful for stonemasons who construct circular staircases. 



Curious geometrical patterns arise when sin n is plotted for integer values of 
n only (Fig. 4.6, see Strang 1991, Richert 1992). 



FIGURE4.6. Values of sin 1, sin 2, sin3, . . . with n in logarithmic scale 


2 Reproduced with permission of Dr. Alfons Uhl Verlag, Nordlingen. 
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Basic Relations and Consequences 


These equations have a venerable age. Already Ptolemy deduces . . . 

(L. Vietoris, J. reine ang. Math. vol. 186 (1949), p. 1) 

Let a and /3 be two angles with ares x and y, respectively. 


(4.2) Theorem (Ptolemy A.D. 150, Regiomontanus 1464). 


(4.3) 

(4.4) 


sin(a: + y) = sin x cos y + cos x sin y 


cos(x + y) = cos x cos y — sin x sin y. 


Proof. These relations can be seen direetly for 0 < x,y < 7r/2 by inspecting the 
three right-angled triangles in Fig. 4.7. All other configurations can be reduced to 
this interval with the use of formulas (4.2b) and (4.2c). □ 



FIGURE4.7. Proof of formulas (4.3) and (4.4) 


By dividing the two equations of Theorem 4.2, we obtain 


(4.5) 


tan(x + y) 


sin x cos y + cos x sin y 
cos x cos y — sin x sin y 


tan x + tan y 
1 — t&nxt&ny 


Further Formulas. Replacing y by —y in (4.3) and (4.4) yields 

(4.3') sin(a; — y) = sin x cos y — cos x sin y 

(4.4') cos(x — y) = cos x cos y + sin x sin y. 
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If we add relations (4.3) and (4.3') we obtain sin(a; + y) + sin(x — y) - 
sin x cos y. Introducing new variables for x + y and x — y, namely 


x + y = u 
x-y = v 


or equivalently 


x = (u + v )/ 2 
y = (u-v)/ 2, 


we obtain the first of the following three formulas: 

(4.6) sinu + sinu = 2 • sin^ M + 1 ^ • cos^-^-^ — ^ 

(4.7) cos u + cos v = 2 • cos ^ U ^ 1 ^ • cos ^ U ^ 

(4.8) cos v — cos u = 2 • sin ^ ~ ^ * sin ^ ^ • 

The others are obtained similarly. 

Putting x = y in (4.3) and (4.4) gives 

(4.9) sin(2a;) = 2 sin x cos x 

(4.10) cos(2a;) = cos 2 x — sin 2 x = 1 — 2 sin 2 x = 2 cos 2 x — 1. 


If we replace x by x/2 in (4.10) we obtain 


Some Values for sin and cos. The proportions of 
the equilateral triangle and of the regular square 
give sin and cos for the angles of 30°, 60°, and of 
45°. For the regular pentagon see the figure (Hip- 
pasus 450 B.C.): the triangles ACE and AEF be- 
ing similar, we have 1 + 1/a; = x, which im- 
plies that x = (1 + \/5)/2, i.e., the point F 
divides the diagonal CA in the golden section 
(see Euclid, 13th Element, §8); thus we find that 
sin 18° = l/(2x). A list of the values obtained is 
given in Table 4.1. For a complete list of sin a for 
a = 3°, 6°, 9°, 12° . . . see Lambert (1770c). 

De Moivre’s Formulas. By replacing y by nx in (4.3) and (4.4) we get the recur- 
rence relations 

(4.12) sin(n + l)a; = sin x cos nx + cos a; sin rur, 

(4.13) cos(n + l)a; = cosa;cosnx — sinæsinna;. 

Starting from (4.9) and (4.10) and applying (4.12) and (4.13) repeatedly, we find 
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TAB LE 4.1. Particular values for sin, cos, and tan 


a 

radians 

sina 

cosa 

tana 

0° 

0 

0 

1 

0 

15° 

tt/12 

^(v/3-l) 

^(Vz + i) 

2-i/3 

18° 

7T/10 

VE-i 

1 4 / 5+75 

2 V 2 

(375-5)\/5+75 

1072 

30° 

tt/6 

2 

2 

X^ 

3 

36° 

tt/5 

i./s-VS 

2 V 2 

x/5+1 

yj 5 — 75(75—1) 

Ish 

45° 

tt/4 

72 

2 

2 

1 

60° 

7t/3 

V3 

2 

1 

2 

V3 

75° 

5tt/12 

^(\/3 + l) 


2 + \/3 

90° 

7t/2 

1 

0 

OO 


cos(3a;) = cos 3 x — 3 sin 2 x cos x 

sin(3a;) = 3 sin x cos 2 x — sin 3 x 

cos(4a;) = cos 4 x — 6 sin 2 x cos 2 x + sin 4 x 

sin(4a;) = 4 sin x cos 3 x — 4 sin 3 x cos x 

cos(5a;) = cos 5 x — 10 sin 2 x cos 3 x +5 sin 4 x cos x 

sin(5a;) = 5 sin x cos 4 x - 10 sin 3 x cos 2 x + sin 5 x. 


Here we discover the appearance of Pascal’s triangle; the computation is precisely 
the same as in Sect. 1.2 (Theorem 2.1). Thus, we are able to State the following 
general formulas (found by de Moivre 1730, see Euler 1748, Introductio §133): 


cos nx = cos 


n(n — 1)„ 


1 -2 

n(n - l)(n - 2)(n - 3) . 4 

1T2T3T4 Sm " C ° S 

„ j nln — l)(n — 2) . , 

m x cos” -1 x v , sin 3 : 

1-2-3 

n{n — l)(n — 2)(n — 3)(n — 4) 


sin xcos 


1-2-3-4-5 
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Series Expansions 

Sit arcus z infinite parvus; erit sin 2 = 2 et cos 2 = 1 ;... 

(Euler 1748, Introductio, §134) 

While all the above formulas (4.5) through (4. 14) have been derived only with the 
use of (4.3) and (4.4) together with (4.2a), we now need a new basic hypothesis: 
when x tends to zero, the “sinus rectus” merges with the arc. Since we are mea- 
suring the angle in radians, it follows that the closer x is to zero, the better sin x is 
approximated by x. We write this as 


sin 2 ; » x fora;— » 0 . 


We now apply the same idea as in the proof of Eqs. (2.18) and (2.19): in de 
Moivre’s formulas (4.14), we set x = y/N, n = N, where y is a fixed value, 
while N tends to infinity and x tends to zero. Then, because of (4.15), we replace 
sina; by x and cos x by 1. Also, since N —> 00 , all terms (1 — k/N) become 
1. This then leads to the formulas, in which we again write x for the variable y 
(Newton 1669, Leibniz 1691, Jac. Bemoulli 1702), 


(4.16) 

(4.17) 



Newton’s derivation of these series is indicated in Exercise 4. 1 ; the above proof is 
due to Jac. Bemoulli as well as Euler’s Introductio, §134. 

Remark. Some care is necessary when replacing cos (y/N) by 1 for large values 
of N, because this expression is raised to the A r th power. For example, 1 + y/N 
tends to 1 for N — *■ 00 , but (1 + y/N) N does not (see Theorem 2.3). Rescue 
comes from the faet that cos (y/N) tends to 1 faster than 1 + y/N. Indeed, we 
have 

cos N {y/N ) = (l - sin 2 (y/./V)) ' « 1 - ^ ^ -► 1 
by (4.2d), Theorem 2.2, and (4.15). 

The convergence of the series (4.16) and (4.17) is illustrated in Fig. 4.8. We 
apparently have convergence for all x (see Sect. III.7). It can be observed (the com- 
putations were intentionally done in single precision) that problems of numerical 
precision due to rounding errors arise beyond x = 15. 

The Series for tan x. We put 

x = - — - = a\x + 03 a ; 3 + a$x 5 + 07 x 7 + . . . . 
cosx 


y = tan ; 
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, 111 1112 
°i - ’ a3__ 6 + 2 _ 3’ 0,5 m 120 ~~ 24 + 6 “ 15' 

If we continue, we find the series 


(4.18) 


X 3 

2 x 5 

17 x 7 

62 x 9 

1382 a; 11 

21844 a; 13 


X + T~ 

f HT ' 

f 315 

+ 2835 ‘ 

f 155925 

+ 6081075 + ‘ ‘ ' 


No general rule is visible. However, there is one, based on the Bernoulli numbers 
(1.29) (see Exercise 10.2 of Sect. 11.10). 


Ancient Computations of Tables. From the values of Table 4.1, which are known 
since antiquity, we can find with the help of (4.3') and (4.4') the values of sin 3°, 
cos 3°, or, as then usual, chord6°. The half-angle formulas (4.11) then allow the 
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computation of chord 3° , chord 1 \ , chord § , but not chord 1° . Ptolemy observed 
that chord | is approximately half of chord 1 ^ . Therefore one might guess that 
chord 1° = | • chord 1 \ , which gives, in base 60 (see Aaboe 1964, p. 121), 
(4.19) 

chord 1° = 0; 1, 2, 50 (correct value 0; 1, 2, 49, 51, 48, 0, 25, 27, 22, . . .). 

Then, the values of sin and cos for all the angles 2°, 3°, 4°, etc. are obtained with 
the help of (4.14). Around 1464, Regiomontanus computed a table (“SEQVITVR 
NVNC EIVSDEM IOANNIS Regiomontani tabula sinuum, per singula minuta 
extensa . . .”) giving the sine of all angles at intervals of 1 minute, with five deci- 
mals. See in Fig. 4.9 a table of tan x written in his hånd (usually with four correct 
decimals). 



FIGURE4.9. Autographic table of tana by Regiomontanus (see Kaunzner 1980) 3 


A very precise computation of sin 1° was made by Al-Kashi (Samarkand in 
1429) by solving numerically the equation (see Eq. (1.9)) 

(4.20) — 4a; 3 + 3x = sin3° 

with the help of an iterative method and giving the solution in base 60 (“We ex- 
tracted it by inspired strength from the Eternal Presence . . .”, see A. Aaboe 1954) 

sin 1° = 0; 1, 2, 49, 43, 11, 14, 44, 16, 19, 16 ... . 

Here is the true value in base 60 calculated by a modern computer, 

sin 1° = 0; 1, 2, 49, 43, 11, 14, 44, 16, 26, 18, 28, 49, 20, 26, 50, 41, ... . 

3 Reproduced with permission of Niimberger Stadtbibliothek, Cent V, 63, f. 30 r . 
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Once again, we see the enormous progress of the series method (4.17), which 
gives sin 1° = sin(7r/180) = sin(0.0174532925 . . .) with only three terms as 

sin 1° w 0.0174532925199-0.0000008860962+0.000000000013496 
» 0.0174524064373 . 


Inverse Trigonometric Functions 

Trigonometric functions define sin x, cosx, tan x, for a given arc x. Inverse 
trigonometric functions define the arc x as a function of sin x, cos x, or tan x. 

(4.3) Definition. Consider a right-angled triangle with hypotenuse 1. If x de- 
notes the length of the leg opposite the angle, arcsina; is the length of the 
arc (see Fig. 4.10a). The values arccos x and arctan x are defined analogously 
( Fig s. 4.10b and 4.10c). 



Because of the periodicity of the trigonometric functions, the inverse trigono- 
metric functions are multivalued. The so-called principal branches satisfy the fol- 
lowing inequalities: 

y = arcsina; 4=> x = siny for — 1 < x < 1, —n/2 <y< 7r/2, 

y = arccos x 4=> x = cos y for — 1 < x < 1, 0 < y < n, 

y = arctan x x = tany for — oo < x < oo, —tt/2 < y < tt/2. 

Series for arctan x. 

If one really exposes something, it is better to give no proof, or such a proof 
which doesn’t let them discover our tricks (Es ist aber guth, dass wann man 
etwas wiirklich exhibiret, ma entweder keine demonstration gebe, oder eine 
solche, dadurch sie uns nicht hinter die schliche kommen.) 

(Letter of Leibniz ; quoted from Euler’s Opera Omnia, vol. 27, p. xxvii) 

The series for arctan x was discovered by Gregory in 1671. In 1674, Leibniz re- 
discovered it and published the formula in 1682 in the Acta Eruditorum, enthusing 
about the kindness of the Lord but without disclosing the path that led him to the 
result (see citation). We therefore search inspiration in Newton’s treatment of the 
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series for arcsin x in the manuscript De Analysi, written 1669, but published only 
40 years later (see formula (4.25) below). One can either compute the arc length 
or the area of the corresponding circular sector. The relation between the two is 
known since Archimedes (“Proposition 1” of On the measurement of the circle), 
and is also displayed by Kepler in Fig. 4.12. 



a) b) 


FIGURE4.il. The derivation of the series for y = arctana; 



FIGURE4.12. The area of the circle seen by Kepler 1615 4 
Let x, a given value, be the tangent of an angle whose arc y = arctana; we want 
to determine (see Fig. 4.1 la). Because of Pythagoras’ Theorem, we have 

(4.21) OA = y/l+a; 2 . 

By Thales’ Theorem, applied to the two larger similar triangles shaded in grey, we 
have 

1 Ax 

(4.22) OB = i and also Au = . 

Vl + æ 2 vl + æ 2 


By orthogonal angles, the small grey triangle is also similar to the two other ones, 
and we have consequently 


Reproduced with permis 


Bibi. Publ. Univ. Ge 
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This means, that the infinitesimal arc length Ay is equal to the shaded area in 
Fig. 4.1 lb. The wanted arc y is therefore equal to the total area between 0 and x 


below 


1 (2.12) 
1 + x 2 


i.e., by Theorem 3.2 (Fermat), 



which is valid for |x| < 1. 

Series for arcsin x. 

A friend that hath a very excellent genius to those things, brought me the 
other day some papers, wherein he hath sett downe methods of calculating the 
dimensions of magnitudes like that of M r Mercator conceming the hyperbola, 
but very generali. . . His name is M r Newton; a fellow of our College, & very 
young . . . but of an extraordinary genius & proficiency in these things. 

(Letter of Barrow to Collins 1669, quoted from Westfall 1980, p. 202) 

After the publication of Mercator’s book towards the end of 1668, in which the 
series for ln(l + x) was published, Newton hastened to show his manuscript De 
Analysi (Newton 1669) to some of his friends, but did not allow its publication. 
It was finally inserted as the first chapter of Analysis per quantitatum (Newton 
1711) published by W. Jones. Newton had not only found Mercator’s series much 
earlier, but was the first to discover the series 


(4.25) 


1 x 3 

arcsinx = i+ -y 


l-3i 5 
2^4 5 


1 • 3 • 5 x 7 
2-4-6 Y + 


and also the series for sin x and cos x (see Exercise 4.1). Newton’s proof for (4.25) 
was as follows. 


Proof. We suppose x given and want to compute the arc y for which 
(see Fig. 4.13). If x increases by Ax, then y increases by Ay, which is 


(4.26) 


Ay ft 


siny 


because the two shaded triangles in Fig. 4.13 are similar. This quantity is the 
area of a rectangle of width Ax and height 1/s/l — x 2 . Therefore, similar as in 
Fig. 4.11c, the total arc length y is equal to the area below the function 1 /s/l — x 2 
between 0 and x. Expanding this function by the Binomial Theorem 2.2 gives with 

o = -1/2 


(4.27) 


1 



1-3-5 

h 2A“6 I 


and we obtain formula (4.25), once again, by replacing the functions 1, x 2 , x 4 , . . . 
by their areas (Theorem 3.2) x, x 3 /3, x 5 /§, .... □ 
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FIGURE4.13. Proof of (4.25) for y = arcsina;; illustration from Newton (1669) 5 


Computation of Pi 

. . . you will not deny that you have discovered a very remarkable property 
of the circle, which will forever be famous among geometers. 

(Letter of Huygens to Leibniz, November 7, 1674) 
Theref. the Diameter is to the Periphery, as l,000,&c. to 3.141592653.589 
7932384.6264338327.9502884197.1693993751.0582097494.4592307816 
.4062862089.9862803482.53421 17067.9+, True to above a hundred Places; 
as Computed by the Accurate and Ready Pen of the Truly Ingenious Mr. 
John Machin: Purely as an Instance of the Vast advantage Arithmetical 
Calculations receive from the Modern Analysis, in a Subject that has bin 
of so Engaging a Nature, as to have employ’d the Minds of the most Em- 
inent Mathematicians, in all Ages, to the Consideration of it. . . . But the 
Method of Series (as improv’d by Mr. Newton, and Mr. Halley) performs 
this with great Facility, when compared with the Intricate and Prolix Ways 
of Archimedes, Vieta, Van Ceulen, Metius, Snellius, Lansbergius, &c. 

(W. Jones 1706) 

Archimedes (283-212 B.C.) obtained, by calculating the perimeters of the regular 
polygons of n = 6,12, 24, 48, 96 sides and by repeated use of formulas (4. 1 1), 


(4.28) 3H <7r < 3 I. 

All attempts made in the Middle Ages to improve on this value were fruitless. Fi- 
nally, by applying Archimedes’ method, Adrien van Roomen (in 1580) succeeded 
in obtaining 20 decimals after years of calculation. Ludolph van Ceulen (=Koln) 
(in 1596, 1616) computed 35 decimals, which for a long time decorated Ludolph’s 
tombstone in St. Peter’s Cathedral in Leiden (Holland). In order to reach this pre- 
cision, Ludolph had to continue the calculations up to n = 6 • 2 60 . 

Leibniz’s Series. From Table 4.1 we know that tan(7r/4) = 1 and consequently 
arctan(l) = 7r/4. Putting x = 1 in (4.24), we find the famous series of Leibniz 
(1682) 


(4.29) 



The right-hand picture of Fig. 4. 13 is printed with permission of Bibi. Univ. Genéve. 
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Although we agree with Leibniz about the undeniable beauty of his formula (“The 
Lord loves odd numbers”, see Fig. 4.14), we also see that it is totally inefficient for 
practical computations, since for 50 decimals we would have to add 10 50 terms 
with “labor fere in aeternum” (Euler 1737). 



FIGURE4.14. Leibniz’s illustration for series (4.29) 6 

Much more efficient is the use of tan(7r/6) = 1 / \/3 (see Table 4.1), which 
leads to the formula 

(4.30) 7T«Jyf (l ~ 3 T 3 3' 2 '“'T • S 3 ^ 9 • 3 4 ~ ‘ ‘ ’)’ 

with which, by adding 210 terms “exhibitus incredibili labore”, Th.F. de Lagny 
computed in 1719 the value displayed at the beginning of this section. The series 
(4.25) for arcsin x can also be used; for example, because of sin(7r/6) = 1/2, we 
have 

tt_1 1 1 1-3 1 1 • 3 • 5 1 

' 6~2 + 23-2 3 + 2- 45-2 s + 2- 4- 67-2 7 + '"' 


Composite Formulas. We insert u = tan x and v = tan y into (4.5) and obtain 


(4.32) 


arctan u + arctan v 


if | arctan u + arctan nj < 7r/2. If we set u = 1/2 and v = 1/3, we see that the 
fraction to the right of (4.32) is equal to 1. This gives Euler’s formula (1737), 


(4.33) 


— = arctan - + arctan 


for which the series (4.24) already converges much better. 

Especially attractive is the approach of John Machin, published (without de- 
tails) in W. Jones (1706, p. 243). Putting u = v = 1/5, we get 


2 • arctan - = arctan 


(r? 


2/5 


1/25, 


= arctan — . 


Reproduced with permission of Bibi. Publ. Univ. Genéve. 
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For u = v = 5/12 one obtains 


Vi -25/144/ ' 
Finally, we put u = 120/119 and search for a v such that 

^ + V = 1, hence v = \ — - = - 
1 - uv 1 + u 

All these formulas together give 


(4.34) 


an expression for which the series (4.24) is particularly attractive for calculations 
in base 10 (see Table 4.2). “The Accurate and Ready Pen” of Machin found 100 
decimals in this way. 


TAB LE 4.2. Computation of n by Machin’s formula 


3 


9 

11 

13 

15 

17 

19 

21 

23 

25 

27 

29 

31 

33 

35 


0. 200000000000000000000000000 
-0. 2666666666666666666666667 

0. 64000000000000000000000 

-0. 1828571428571428571429 


1861818181818181818 

63015384615384615 

2184533333333333 

77101176470588 

2759410526316 

99864380952 

3647220870 

134217728 

4971027 

185128 

6927 


260 

10 


1 0. 004184100418410041841004184 
3 -0. 24416591787083803627 
5 0. 256472314424647 
7 -0. 3207130658 
9 0. 43669 
11 -0. 1 


= 0. 197395559849880758370049763 = 0. 004184076002074723864538214 


The search for other formulas of this type becomes a problem of number the- 
ory. Gauss, as a by-product of 20 pages of factorization tables, found (see Werke, 
vol. 2, p. 477-502) 


v = 12 arctan + 8 arctan — 5 arctan — , 

4 18 57 239 

— = 12 arctan — + 20 arctan — + 7 arctan —— + 24 arctan — — . 

4 38 57 239 268 

Today, several million digits of 7 r have been calculated. See Shanks & Wrench Jr. 
(1962) for a list of the first 100 000 decimals (the lOOOOOth digit is a 6). More 
details about old and recent history can be found in Miel (1983). 
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Exercises 


4.1 (Newton 1669, “Inventio Basis ex data Longitudine Curvæ”). Having found 
the series z = x + |a; 3 + x 5 + jj^x 7 + ... for the arcsin (see (4.25)), 

discover the series for x = sin z in the form x = z+a^z 3 +a^z 5 +arz 7 + . . . 
(similar to Exercise 3.2) and that of w = cos z by expanding w = y/l — x 2 
(see Fig.4.15). 


Si ex dato arcu «tD Sinus AB défideratur j atqua- 
tionis x = x + 4 -* 3 + T^+TT-r* 7 * &c. fupra in- 
ventar, (pofito nempe AB = x, <tD = z, & A* = i,) 
radix extrafta erit x—z — 4-zn + 777*5 — 

-E T f 1 i i'^ i 

Et præterea fi Cofi num Aø ex ifto arcu dato cu- 
pis, fac A$( — V ~ x ) = 1 — jz* + — yh** 



FIGURE4.15. Extract from Newton (1669), p. 17 7 


4.2 Understand Ptolemy’s original proof of the addition theorems (4.3) and (4.4) 
for the chord function (see Fig. 4.16). 



'£>2opofmo mi. 

fi g^SB jltPtigcboidis mcqnalhim arcuuni in fcmicirctilo: 
iRc^Jgarcue qu o matonninojé fupcrai cbo:da nota fict. 

IlS^JalfPO" rmteirnilo..i.b.d.fop:j pumctru.a.d.iiote lintcbot 
iSw^Sat.ii.b.a.j.DKO notam fieri elto:dam.b.d.itamp:r etKtcto/ 
ll^mSriurn pnmcbuiuo note ctiam ficiiKbo:de.b.d.?.(5.d. fSint 
'“"SSif inqunfinlnKTO.i.b.g.d.ounictn.a.s.c.li.d.iion.ruiite late 
*.a.b.T.g.d.oppofira not.i.igtf per ptcmtlTam quod fif cj.n.d.in.b.ø.notu 
ief.Srd.a.d.cfttiorarquia 8iåmctcrnrnili.tdeo.b.g.notjfief:dqoerebaf. 
•per bie pltmmor oraiu ebotdas eottnofeco. Kopie« cni ebotdi areuo quo 
4nta par« ciraiferenne fqti fupat.f.obo:di araio.e.gradau:? fie oo oli jo. 


FIGURE4.16. Ptolemy’s proof of formula for chord (a + /3); from Almagest, transi. by 
Regiomontanus, printed 1496 7 


Hint. Use (and/or prove) “Ptolemy’s 
Lemma”, which States that the sides and 
diagonals of a quadrilateral inscribed in 
a circle satisfy ac + bd = 6162 - For 
the proof of the lemma, draw a line DE 
such that angle EDA equals angle CDB. 
So we have similar triangles 

EDA S CDB => b/5t = u/d 

DCE S DBA =► a/S 1 = v/c 
whence bd + ac= (u + v)Si = 6162 - 



Figs. 4.15 and 4.16 are reproduced with permission of Bibi. Publ. Univ. Genéve. 



56 I. Introduction to Analysis of the Infinite 


4.3 The hyperbolic functions (Foncenex 1759, Lambert 1770b). For a given x 
let P be the point on the hyperbola u 2 — v 2 = 1 such that the shaded area of 
Fig. 4.17 (left) is equal to x/2. Then, the coordinates of this point are denoted 
by (cosh x, sinh x). 

a) Prove that 

,.. n , c 1 ' -e. x . e æ -e~ æ 

(4.35) coshx = , smhx = . 

Hint. The areas of the triangles ACB and PCQ are equal. Hence, the areas of 
ACPA and ABQPA are also equal and are equal to (lna)/2, if the distance 
between C and Q is denoted by a/\/2 (Fig. 4.17, right). 

b) Verify the relations 

sinh(x + y) = sinh x cosh y + cosh x sinh y 
coshfa; + y ) = cosh x cosh y + sinh x sinh y. 

c) The inverse functions of (4.35) — the area functions — are defined by 

y = arsinhx -4=> x = sinhy for — oo < x < oo, —oo < y < oo, 

y = arcoshæ x = coshy for 1 < x < oo, 0 < y < oo. 

Prove that arsinh x = ln(æ + \/x 2 + 1 ) , arcosh x = ln(ar + \J x 2 — 1 ) . 



€ 


FIGURE4.17. Definition of hyperbolic functions 


4.4 Verify (and use) Newton’s advice (Newton 1671, Probl. IX, §XLIX) for the 
computation of 7r: by computing the area a under the circle y = x l / 2 {l — 
x) 1 / 2 between x = 0 and x = 1/4 by binomial series expansion, show that 

7T = 24o + 3\/3/4 

_ /2 1 121 1-121 1-1-321 \ 3^3 

“ 24 V3 23 “ 2 ' 5 25 _ 2^4 ' 7 ¥ ~ 2-4-6 ' 9 2^ “ " 7 + 
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1.5 Complex Numbers and Functions 


Neither the true nor the false roots are always real; sometimes they are 
imaginary; that is, while we can always imagine as many roots for each 
equation as I have assigned, yet there is not always a definite quantity cor- 
responding to each root we have imagined. (Descartes 1637) 

Cardano (1545, in his Ars Magna ) was the first to encounter complex numbers by 
asking the following question: divide a given line ab, say, of length 10 “in duas 
partes”, so that the rectangle with these two parts as sides has area 40. Everybody 
can see (see Fig. 5.1) that the area of such a rectangle is at most 25, so the prob- 
lem has no real solution. But algebra gives us a solution, since the corresponding 
equation (see Eq. (1.3)) x 2 — 10 x + 40 = 0 leads to (“ideo imaginaberis \J— 15”) 

5 + and 5 - v /z 15- 

Although these formulas are perfectly useless and sophistic (“quæ uere est sophis- 
tica”), they must contain an amount of truth, since their product 

(5 + \/^l5)(5 - \/— 15) = 25 - (-15) = 40 
is actually what we want (see Fig. 5.1). 



.Il 


5m:ig m :i5 
25 m:m:i5qd.efl:4o 


FIGURE5.1. Excerpts from Cardano’s Ars Magna 1 

During the following centuries, such “impossible” or “imaginary” (Descartes, 
see quotation) solutions of algebraic equations came up again and again, gave rise 
to many disputes, but proved to be more and more useful. Full maturity in their 
handling was achieved in the work of Euler, who also introduced later in his life 
the symbol i for v 7 — 1 • The above values are now written as 5±i\/l5 and complex 
numbers are of the general form 


c = a + ib, 

where a = Re (c) is called the real part, and b = Im (c) the imaginary part of 
c. The interpretation of a complex number a + ib as the point (o, b) in the two- 
dimensional complex plane is due to Gauss’ thesis (1799) (see Fig. 5.2) and to 
Argand in 1806 (see Kline 1972, p. 630). 


Reproduced with permission of Bibi. Publ. Univ. Genéve. 
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Complex Operations. For computation with complex numbers we keep in mind 
the relation i 2 = — 1 and apply the usual rules for rational or real numbers. There- 
fore, the sum (or the difference ) of two complex numbers 

is the complex number obtained by adding (or subtracting) the real and imaginary 
parts. The product becomes (compare with Fig. 5.1) 

(5.1) c- w = au — bv + i(av + bu). 

To compute the quotient w/c we observe that the product of c with its complex 
conjugate 

(5.2) c = a — ib 

is real and nonnegative, namely c • c = a 2 + b 2 . Multiplying numerator and de- 
nominator of w/c by c the quotient w/c becomes for c 7 ^ 0 
w w ■ c au + bv . av — bu 
(5-3) 7 = 77F = a 2 + b 2 + % a 2 + b 2 ' 


Euler’s Formula and Its Consequences 

. . . how imaginary exponentials are expressed in terms of the sine and co- 
sine of real ares. (Euler 1748, Introductio, §138) 

This formula, discovered by Euler in 1740 by studying differential equations of 
the form y" + y = 0 (see Sect. II. 8 ), is the key to understanding operations with 
complex numbers. 


Reproduced with permission of Georg Olms Verlag. 
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We define e' x by the series of Theorem 2.3 (with x replaced by ix), use the 
relations i 2 = — 1, i 3 = —i, i 4 = 1, i 5 = i , . . ., and separate real and imaginary 
parts: 


(ix) 2 (ix) 3 (ix) 4 (ix) 5 

1 + “ + 2! + 3! + 4! + 5! + --' 



The result is the famous formula (Euler 1743, Opera Omnia, vol. 14, p. 142) 


(5.4) 


As a first application, we insert the particular values x = -k/2 and x = tt, which 
give 

g*7T/2 _ £ an( J e l7r _ _ 1 ^ 

elegant formulas combining the famous mathematical constants n, e, and i in won- 
derfully simple expressions. 

Polar Coordinates. Equation (5.4) shows that the point e lcp has real part cos ip 
and imaginary part sin ip, i.e., it is the point on the unit circle at which the radius 
forms an angle <p with the real axis (see Figs. 5.2a and 4.3). Consequently, each 
complex number can be written as 

(5.5) c = a + ib = r ■ , 


where 

(5.6) r = \/a 2 + b 2 = s/c- c and (p = arctan^-j. 

We call r = |c| the absolute value of c and <p = arg(c) its argument. Let 
c = r ■ e lH> and w = s ■ e 10 


be two complex numbers in polar coordinate representation. It follows from (4.2a) 
that c = r ■ e~ vf and from Theorem 4.2 that 

e iv . e *0 _ ( cos ^ isimp) . (cos 6 + i sin 0) 

(5.7) = (cos ip cos 6 — sin p sin 0) + i (cos ip sin 0 + sin ip cos 0) = e l( -‘ p+0 ' ) . 

Therefore, we obtain for the product and quotient 

c-w = rs- e i(v+9) , - = _ 


(5.8) 
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Here the polar coordinate form is especially illuminating: multiplication multiplies 
the radii and adds the angles; division divides the radii and subtracts the angles. 
Roots. We wish to know, say, ty~(:. Once again, polar coordinates perform the 
miracle, since roots of products are the products of the roots. However, we must 
be careful, because e 2l7r = 1 and e 4,7r = 1 have cube roots e 2- ' 777/3 and e 4 * 77 / 3 , 
which are different from 1. Thus, there are three cube roots of c, 

(5.9) tyc= tyr ■ e iv/3 , tyr • e *(^/ 3 + 27r / 3 ) f typ . jbp/ 3+4tt/3) _ 

These, for c = 3 + 2 i, are displayed in Fig. 5.2a. The next candidate, e 6 ’" = 1, 
just reproduces the first of the roots and gives nothing new. The roots thus obtained 
form a regular star; of Mercedes-type for n = 3, of Handel’s Fire-Musick-type for 
n > 3. Fig. 5.3 represents the map 2 i— * w = z 3 for varying values of 2 and its 
inverse function tu i —* z = tu 1 / 3 = tyw. The animal that thereby undergoes 
painful deformations is known as “Amold’s cat”. The inverse map produces three 
cats out of one. 



Exponential Function and Logarithm. The exponential function can be ex- 
tended to complex arguments as follows: 

(5.10) e c = e a ■ e zb = e a (cosb + isinb) for c = a + ib. 

This definition retains the fundamental property e c+w = e c ■ e " 1 , which is obtained 
from Eq. (5.7). 

The nature of the logarithms of negative numbers gave rise to long and heated 
disputes between Leibniz and Joh. Bernoulli. Euler (1751) gave a marvelous sur- 
vey of these discussions, which were kept as secret as possible since such disputes 
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would have damaged the prestige of pure mathematics as an exact and rigorous 
science. The true nature of logarithms of negative and complex numbers was then 
revealed by Euler (“Denouement des difficultés precedentes”) with the help, once 
again, of Eq. (5.4). Many of the contradictions of the earlier disputes were resolved 
by the faet that the logarithm of a complex number does not represent one number, 
but an infinity of values. We write c in polar coordinate form 

c = r ■ e ^ +2kn) k = 0, J|#2, 

which is a product. In order to retain properties (3.1) and (3.7) for the logarithm 
with complex arguments, we define 

(5.11) ln(c) = ln(r) + i(ip + 2kn), k = 0, ±1, ±2, . . . . 

Fig. 5.4 represents the map w = e z and its inverse. Since the imaginary 
part of the logarithm is simply ip = arg(c) it is clear that, after each rotation 
(p i — ► (p 2tt, the logarithms repeat again and again. 



FIGURE 5.4. The funetion w = e z and its inverse 2 = ln w 


A New View on Trigonometric Functions 

The shortest path between two truths in the real domain passes 
through the complex domain. 

(Jacques Hadamard; quoted from Kline (1972), p. 626) 
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Replacing x in (5.4) by — x we have e lx = cos x — i sin x\ by then adding and 
subtracting these formulas we obtain 


(5.12) 

(5.13) 

(5.14) 


Thus, in the complex domain, trigonometric functions are closely related to the 
exponential function. Many formulas of Sect. 1.4 become connected with those 
for e x ; e.g., de Moivre’s formulas (4.14) simply State that e mx = (e lx ) n . This is 
not a new proof, however, as we based it on Eq. (5.4), which was deduced from the 
series of (4.16) and (4. 17), which were in turn proved using de Moivre’s formulas. 


Inverse Trigonometric Functions. If we insert in (5.12), (5.13), or (5.14) a vari- 
able u for e' æ and v for either sin x, cos x, or tan x, we obtain algebraic relations 
that can be solved for u. As a result, the inverse trigonometric functions are ex- 
pressed with the help of the complex logarithm as follows: 


(5.15) 

arcsina; = — i \n(ix + \/l — x 2 ) 

(5.16) 

arccosx = —i ln(a: + iy/l - x 2 ) 

(5.17) 

arctanx = 4 ln ( 1 X ^ . 

2 \i-xJ 


Since the logarithmic function is many-valued, attention must be drawn to the cor- 
rect branch (i.e., value of k in (5.11)) of the function to be used. The last formula 
explains the striking similarity between the series of Eq. (4.24) for y = arctan x 
and Gregory’s series (3.15) for ln((l + x)/(l — a;)). Also, Machin’s formula of 
Eq. (4.34) becomes equivalent to the factorization of the complex numbers 


(5.18) 


1 i + l /5i + l\4 /239* + 1 n-i 
i ~ V5i- 1/ ‘ V 239* — 1 y ' 


Euler’s Product for the Sine Function 


. . . and I already see a way for tinding the sum of this row } + ! + ! + Aetc. 

(Joh. Bemoulli, May 22, 1691, letter to his brother) 
One of the great mathematical challenges of the early 18th century was to find an 
expression for the sum of reciprocal squares 


(5.19) 


Joh. Bernoulli eagerly sought for this expression for many decades. Euler (1740) 
then found the following elegant solution: we know from algebra that, e.g., 
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(5.20) 1 — Ax + Bx 2 — Cx 3 = (1 — ax){\ — j3x){ 1 — 'yx), 

where l/a, 1//3, 1/7 are the roots of the polynomial 1 — Ax + Bx 2 — Cx 3 . 
Furthermore, the first of the so-called “Viéte’s identities” is 


(5.21) 


A = a + /3 + 7. 


Now, we apply the same principle fearlessly to the infinite series 



with its infinite number of roots ±7r, ±27 t, ± 37 t, . . . and Eq. (5.20) becomes 


Comparing this relation with (5.22), the analog to (5.21) (with x replaced by x 2 ) 
becomes 


(5.23) 


1111 1 _ 1 
7T 2 ^ 47T 2 97T 2 167T 2 257 T 2 ^ " ’ 6 


and the sum (5.19) is 7 t 2 /6. However audacious this argument and however beau- 
tiful its result, its mathematical rigor was poor even by 18th century standards. 
Therefore, Euler later looked for a better proof (1748, Introductio, §156). We start 
with the factorization of z n — 1 . 


Roots of Unity. The polynomial z n — 1 pos- 
sesses the roots 2 = yA = e 2lkj: / n , k = 

0, ±% t ±2, .... Since e 2l7r = 1, only n consec- 
utive values of k give rise to distinet roots. For 
example, for n = 7 these solutions are 

1, e 2 ™'\ e~ 2 ™'\ 
e 4« r/7 ) e -4w/7 i (A™ H , e ~^ C . 

A factorization similar to (5.20) is also valid 
for polynomials with complex roots. Indeed, if 
we divide the polynomial p(z) by (z — c) we 
obtain 

p(z) = ( z - c)q(z) + d 

with d = p{c). If c is a root of p(z) we have obtained the factorization p(z) = 
(z — c)q{z). Applying the same procedure to q(z), and repeatedly to the resulting 
polynomials, a factorization of p(z) into linear factors (z — c) is obtained. For our 
polynomial z 7 — 1 we thus get 
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z 7 -l = (z-l)-(z- e 2 ™/ 7 ) ■ (z - e~ 2i ^ 7 ) 

■ (z - é™' 7 ) • (z - e ~ 4in/7 ) • (z - e*™' 7 ) • (z - e~ 6i7r/7 ), 


or, in general, 

(5.1) Theorem (Euler 1748, Introductio, Chap. IX). For n odd we have 

(«- i)/a 

z n -l = (z-l) n (z~e 2ik *^){z~e~ 2ik ^) 

(5.24) 

( "“ 1)/2 2krr 

= (z — 1) ]^[ (z 2 — 2z cos 1-1). 


Proof. The first identity is the factorization derived above. The second one is ob- 
tained with the help of Eq. (5.13). □ 

By replacing z — > z/a in (5.24) and multiplying by a n we obtain a slightly 
more general result: 


(5.25) 


= (z — a) ]^[ (z 2 — 2 az cos — — + a 


We now insert z = (1 + x/N), a — ( 1 — x/N) into (5.25) and put n = N. This 
gives 


(>+v) -(-£) 

2x 'V'V, 2 * 2 

=¥■ n ( 2 +^- 


2x 

N 


(iV-l)/2 

n 2 


2fc7T\ 


æ 2 / 2kn\\ 

w{ 1 +a *-ir)) 


= Cjv • x ■ 


(JV-l)/2 

n 



1 + cos(2fc7r/iV)\ 

1 - cos(2kn/N) ) ' 


Since the coefficient of x in the polynomial (1 + x/N) N — (1 — x/N) N equals 
2 (see Theorem 2.1), we have Cjv = 2 for all N. For large N the left-hand side 
of the above formula becomes e x — e~ x (Theorem 2.3) and, using the faet that 
cos y » 1 — y 2 / 2 for small y, the fcth factor in the right-hand side tends to 


(*£*)■ 


Therefore, we obtain 
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— ( i+ 5)( i+ £o( i+ iå) 


Since there are infinitely many factors, care has to be taken with this limit (for a 
justification see Exercise III.2.5). 

Replacing x by ix, we find the desired function sina; to the left. Thus we 
have obtained the following famous formula in a more credible way. 


(5.2) Theorem (Euler 1748, §158). The function sin x can be factorized a. 


The convergence of this product is illustrated in Fig. 5.5. We observe that the 
convergence is better for smaller values of |ar|. 



FIGURE5.5. Convergence of the product of Theorem 5.2 


Wallis’ Product. We put x = 7r/2 in the formula of Theorem 5.2. This gives 

sin f- 1 = f 0"i) ('-is) ( I_ å) - 

_ 7T 1 3 35 57 

“22 2 4 4 6 6 
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and we obtain the famous product of Wallis (1655), 


7 r 2-2 4-4 6-6 8-8 10-10 
2 “ 173 ' 3~5 ' 77 ' 77 ' 9-11 


Remark. The original proof by Wallis starts from the faet that 7t/2 is the area be- 
low (1 — æ 2 ) 1 / 2 (between —1 and +1), followed by a complicated procedure of 

interpolation based on the known areas for ( 1 — x 2 ) 0 , ( 1 — x 2 ) 1 , ( 1 — x 2 ) 2 , Pre- 

cisely this idea inspired Newton in his discovery of the general binomial theorem 
as discussed in Sect. 1.2. 


Exercises 


5.1 (Euler 1748, §185.) Set x = 7r/6 in the formula of Theorem 5.2 and obtain, 
with the help of sin(7r/6) = 1/2, another product for 7t/2: 

tt _ 3 6_6 12-12 18-18 24-24 
2 _ 2 ' 5 • 7 ' 11 • 13 ' 17- 19 ' 23- 25 ' 

then insert x = 7r/4, multiply the obtained product by Wallis’ product, and 
obtain the following interesting formula: 

r 2-2 6-6 10-10 14-14 18-18 
v ; 1-35-7 9-11 13-15 17-19 


5.2 (Euler, Introductio §166, 168). Generalize (5.19) and (5.21) in the following 
way: let 

1 + Aiz + A 2 z 2 + A 3 z 3 + . . . = (1 + aiz)(l + a 2 z)( 1 + a 3 z) 

(here z stands for x 1 in Theorem 5.2), and define the sums of the powers 


S\ = oti + a 2 + a 3 + . . . 

5 2 = a 2 + a% + a 3 + . . . 

5 3 = al + + <*! + • • • , 

and so on. Then, present a “demonstratio gemina theorematis Neutoniani” 


(5.30) 


51 = Ai 

5 2 = AiiSi — 2A 2 

5 3 = A\S 2 — A 2 S\ + 3 A 3 

5 4 = A i S 3 — A 2 S 2 + A :i Si - 4 A 4 


and deduce from these formulas and from Theorem 5.2 the following sums: 
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1 1 1 1 _^_2V i 

+ 2 2 + 32 + 42 + " ' - g “ 2 • 2! ' 6 

1 1 1 _ 7T 4 _ 2 4 tt 4 1 

, s , n + 2 4 + 3 4 + 44 + "' ~ 90 “ 2^4! ' 30 

,111 _ 7T 6 _ 2 6 7T 6 1 

+ 2 6 + 3 6 + 4 6 + ’ " “ 945 “ 2 • 6! ' 42 

1 1 1 7T 8 2 8 tt 8 1 

+ 28 + 3S + 48 + '" _ 9450 2~8! ' 30‘ 

Remark. Actually, Euler wrote these expressions a little differently, and 
the connection with the “Bernoulli numbers” (see Sect. II. 10 below) be- 
came clear to him only a couple of years later (1755, Institutiones Cal- 
culi Differentialis, Caput V, §124,125,151, “ingrediuntur in expressiones 
summarum. . .”). 

5.3 (Euler 1748, §169). Show, either by a proof similar to the preceding one 
(starting from the roots of z n + 1 = 0), or by using cos x = sin 2x/ (2 sin x), 
that 


=no 


(2 k — 1) 2 7 r 2 ^ 


4a; 2 w 4 i 2 \/ 4x 2 \ 

" H*) V 1 “ 9^2 ) V ~ 25^2 ) ■ ' 


Obtain by using this product such expressions 


(5.32) 


J_ J_ J_ _ 7T^ 
+ 3 4 + 5 4 + 7 4 + ‘ " “ 96 ' 


Show that (5.32) can also be obtained directly from (5.31). 

5.4 (Euler 1748, §189-198). Take the logarithm of the formula of Theorem 5.2 
(which transforms the product into a sum) and derive ingenious ways of com- 
puting 

ln(sin(a;)) 

by using the expansions (5.31). 

5.5 Using Cardano’s formula (1.14) compute all roots of 


(5.33) 


x " 


— 5x + 2 = 0. 


In spite of the faet that all three roots are real, one has to compute the cube 
roots of a complex number. 

5.6 Simplify the computation of the roots of (5.33) by the following idea (Viéte 
1591a): set x = /zeosa and replace cos a by x//i in the identity cos 3a = 
4 cos 3 a - 3 cos a in order to get 

q 3u n 3 

ar x cos 3a = 0. 

4 4 

Compare this equation with (5.33) to obtain //, a, and x. 
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1.6 Continued Fractions 


The theory of continued fractions is one of the most useful theories in Arith- 
metic . . . since it is absent from most works on Arithmetic and Algebra, it 
may not be well known among geometers. I would be satisfied if I were 
able to contribute to make it slightly more familiar. 

(Lagrange 1793, Oeuvres, vol. 7, p. 6-7) 
We say therefore; that the Circle is to the Square of the Diameter, as 1 to 
' x ^ x 1 x 1 x i x <&c, infinitely. Or as 1 to 



2 + &c, infinitely. 


How these Approximations were obtained . . . would be too long here to 
insert; but may by those be seen, who piease to consult that Treatise. 

(J. Wallis 1685, A Treatise of Algebra, p. 318) 


After having seen the use of infinite sums and infinite products in analysis, we 
now discuss a third possibility of an “infinitorum” process, infinite quotients, i.e., 
continued fractions. 


Origins 

The Euclidean Algorithm. This algorithm for the computation of the greatest 
common divisor of two integers has been known for more than 2000 years (Euclid, 
~ 300 B.C., Elements, Book VII, Propositions 1 and 2). Let two positive integers 
be given, for example 105 and 24. We divide the larger by the smaller and obtain 
the quotient 4 with remainder 9, i.e., 

105/24 = 4 + 9/24. 


We now continue the process with the divisor and the remainder: 

24/9 = 2 + 6/9, 9/6 = 1 + 3/6, 6/3 = 2. 


The algorithm must stop, since the remainders form a strictly decreasing sequence 
of positive integers. The last nonzero remainder (here 3) is the greatest common 
divisor we were looking for, and by combining successive steps we get 


(6.1) 


2 



Irrational Numbers. If this form of the Euclidean algorithm (repeatedly subtract 
the integer part and inverse) is applied to an irrational number, it cannot terminate, 
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since a finite expression as in (6.1) must be rational. For example, with a 
we obtain 


1.4142... = 1 +0.4142 = 1^ 


2.4142... 


1 

2 + 0.4142...' 


V2 


The reappearance of the digits of y/2 in the last quotient is no surprise, since y/2 
satisfies precisely a = 1 + 1/(1 + a) (multiply by 1 + a to see this). Continuing, 
we obtain the following formula of Bombelli from 1572: 

V2=l+— . 

( 6 . 2 ) 2 + - 

2+ 2T7T 

The simplest of all sequences is obtained from the “golden mean”, which gives 

1+^5 . 1 _ , I 

,, .. 2 6 803 1.61803 ■" ,1 


Further examples are as follows: 
s/3 = i i — 



The quotients 1, 1, 2, 1, 2, 1, 2, 1, . . . which appearfor y/3 are periodic, those 
for e and for (e — l)/(e + 1) also exhibit a regular behaviour. We shall explain this 
below for (e — l)/(e + 1), which is tanh(l/2) (c.f. Eq. (6.31) below). However, 
the regularity for e is trickier (see Hurwitz, Werke 2, p. 130). No regularity at all 
appears for the quotients of ir, even if we compute thousands of them (Lambert 
(1770a) computed 27, Lochs (1963) computed 968). 

Lord Brouncker’s Fraction for 7t/4. One year after the discovery of Wallis’ 
product for n. Lord Brouncker succeeded in transforming it into an interesting 
continued fraction (see the quotation above and Eq. (6.23) below). This result in- 
spired Wallis to include a theory on continued fractions on the last two pages of 
his Arithmetica Infinitorum (1655, see Opera, vol. I, p. 474-475). 
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Lambert ’s Continued Fraction for tan x. 

But the incentive for seeking these formulas came from Eulers Analysis 
infinitorum, where the expression . . . appears in the form of an example. 

(Lambert 1770a) 

As we have seen in Sect. 1.4, the function tan x = sin x / cosx does not have a 
particularly simple expansion into an infinite series. We start from 
sinæ x — x 3 /6 + x 5 /120 — . . . x 
ta ‘ liX = ^ = 1 — x 2 /2 + x 4 /24 — . . . = 1 — æ 2 /2 + æ 4 /24 — . . . ' 

1 — x 2 /6 + x 4 /120 — . . . 

For x — > 0, the denominator tends to 1. We therefore subtract 1 and obtain 

taT1 T = f = f 

1 x 2 /3-x 4 /30 + ... i x 2 

1 - æ 2 /6 + æ 4 /120 - . . . 1 ~~ 1 - æ 2 /6 + . . . 

1/3 - æ 2 /30 + . . . 

Here, for x — > 0, the last denominator tends to 3. Subtracting 3 we then obtain 


tanx 



( 'omnium!: like this, we find that the subsequent denominators are 5, then 7, and 
so on. For an 18th century man (Lambert 1768) there is then no doubt that the 
following formula is true in general: 



A couple of decades later, Legendre (1794) gave a complete proof (see Exercise 

6.6). 


An expression of the type 


is called a continued fraction . The fractions Pi /qi , P 2 /di , P 3 Ids , • • • are called the 
partial quotients of the continued fraction. If all pk = 1, the continued fraction is 
called regular. 
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Convergents 


If the continued fraction (6.7) is truncated at its kth quotient, we obtain a rational 
number 


(6.8) 


which is called the kth convergent of the continued fraction. We want to write 
these rational numbers as quotients of two integers. The first cases are easy: 


(6.9a) 

(6.9b) 


go gi + Pi 
gi 

gQglg2 + gQP2 +Plg2 
glg2 +P2 


Let Ak denote the numerator, and Bk the denominator, when the expression (6.8) 
is evaluated in this manner. From (6.9) we have 

Ao = go ; Bo = 1, 

^4i=gogi+Pi, -Bi = gi, 

A 2 = g0glg2 + QoP2 + Plg2, B 2 = glg2 + P2- 

We now look at these formulas, as Euler says, “with a bit of attention” (tamen 
attendenti statim patebit), and discover the following beautiful structure: 


(6.10) A 2 = q 2 Ai + p 2 Ao, B 2 = q 2 B\ + p 2 Bo- 


For the computation of A 3 and B 3 , whose quotient must be 
. Pi 

go H 

^ ^ P2 

g2 +P3/q3 

we could, by comparing with (6.9b), take the formulas for A 2 and B 2 and replace 
everywhere q 2 by the quantity q 2 + P.3 / g,3 • But the expressions obtained in this 
manner would in general not be integers. We therefore multiply both numbers by 
<73, which does not alter their quotient, and have from (6.10), 

A 3 = ((g2 +P3/q3)Ai +p 2 A 0 S j ■ q 3 , B 3 = (( q 2 +p 3 /q3)B^ +p 2 B^j ■ q 3 . 

These two expressions become, after simplification, 


A 3 = q 3 A 2 +p 3 A 3 , B 3 = q 3 B 2 +p 3 B±. 


This structure now repeats again and again and we have 
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(6.1) Theorem (Wallis 1655, Euler 1737b). The numerators and denominators of 
the convergents (6.8) are determined recursively by 


Ak = qkAk-i +pkAk-2 
Bk = qkB k -i +pkBk- 2 


( 6 . 12 ) 


A- 1 =-| A 0 = q 0 A 1 = q 1 q 0 +pi 

B - 1 = 0 Bq = 1 Bi = q\. 


(6.2) Examples. Equations (6.11) and (6.12) applied to the above examples lead 
to sequences of rational numbers, 

1 + V5 _ 1 2 3 5 8 13 21 34 55 89 144 

2 ~ 1’ 1’ 2 ’ 3 ’ 5 ’ 8 ’ 13 ’ 21 ’ 34 ’ 55 ’ 89 ’ " ' 

1 - Z E li ZZ m ZZZ ZZZZ 3363 

V ^ 1 ’ 2 ’ 5 ’ 12 ’ 29 ’ 70 ’ 169 ’ 408 ’ 985 ’ 2378 ’ " ' 

25719267197265362989 1351 

V ~ 1 ’ 1 ’ 3 ’ 4 ’ 11 ’ 15 ’ 41 ’ 56 ’ 153 ’ 209 ’ 571 ’ 780 ' 

^2 38141987106193 1264 1457 2721 

eW T’ I’ 3’ 4 ’ T’ 32’ "39"’ 71 ’ "465" ’ "336“ ’ Krøl ’ ' 

_ 3 22 333 355 103993 104348 
77 ^ 1 ’ 7 ’ 106 ’ 113 ’ 33102 ’ 33215 ’ 

which (see Fig. 6.1) rapidly approach the original irrational numbers. 



FIGURE6.1. Errors for convergents Ak/ Bk (logarithmic scale) 
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The approximations for \/2 and \/3 were known in antiquity (Archimedes 
used 265/153 < \/3 < 1351/780 without further comment). The two conver- 
gents 22/7 (Archimedes) and 355/113 (Tsu Chung-chih around 480 in China, 
Adrianus Metius 1571-1635 in Europe) for ir are of a better than average qual- 
ity. Explanation: the first denominator q k +i to be neglected is large (15, respec- 
tively, 292). Two other very precise approximations for ir, which are the 1 lth and 
26th convergents respectively, have been calculated 1766 in Japan by Y. Arima 
as 5419351/1725033 and 428224593349304/136308121570117 (see Hayashi 
1902). On the other hånd, for the golden mean (all qk = 1) we have slow con- 
vergence. Here, (6.11) becomes the recursion formula for the Fibonacci numbers 
(Leonardo da Pisa 1 170-1250, also called Fibonacci). 

Some convergents of the continued fraction (6.6) for tan x, 

(6.13) 

x 3x 15x — x 3 105a; — 10a; 3 945a; — 105a: 3 + x 5 

I’ 3 — x 2 ’ 15 - 6a; 2 ’ 105 - 45a; 2 + x 4 ’ 945 - 420a: 2 + 15a; 4 ’ ‘ ' 

are displayed in Fig. 6.2 and nicely approach the function tan x, even beyond the 
singularities x = 7r/2, 3ir/2 , .... 



Inflnite Series from Continued Fractions. The difference of two successive con- 
vergents satisfies 

,, ... A k +i A k Ak+\B k — AkBk+i , P1P2 • ■ ■ ■ • Pk+i 
(614> ~Bk+i ~ Bit = = <_1) ~BkBk+i 

The last identity is seen as follows: using (6.1 1) we have 

Ak+iBk - A k B k + 1 = ( qk+iAk + Pk+\A k -i)B k - A k (q k+1 B k + p k +iB k -i) 
= ~Pk+i{A k B k -i - A fc _iBfc) = . . . 

= P2-...-Pk+i(-l) k (A 1 B 0 -A 0 B 1 ) 


and ( AiBq — AqBi) = p\ because of (6.12). Writing the convergent A k /B k as 
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— = 

B k \B k B k _J VB fe _! B k _J ••• T VS 1 B 0 ) B 0 ' 

we see from (6. 14) that 

(6.i 5 ) + f -ff + inr-... + (-i )*■ 

-Ofc -L>1 -L>l-D2 -O 2-^3 

and we have 

(6.3) Theorem. r/te convergents of (6.7) are the truncated sums of the series 


n fc-i P 1 P 2 • ■ ■ ■ • f 
B k -iB k 


(6.16) 


P1P2P3 P1P2P3P4 
B 2 B 3 B :i B.i 


For regular continued fractions (all p k = 1) we have 


(6.160 


1111 
qo + Ih ~ BA + B^h ~ B^Bl + ' ' 


Since l/(B k _iB k ) is the smallest possible distance between two different rational 
numbers with denominators B k _ 1 and B k , the interval between A k _j fB k _\ and 
A k / B k cannot contain a rational number whose denominator is not larger than 
B k . 


Continued Fractions from Infinite Series. Let 


(6.17) 


11111 

Cl C2 C3 C4 C5 


be a given series with integer c* ; we want to find integers p t , such that the series 
(6.17) coincides term by term with (6.16) (with qo = 0). 

Solution. We put pi = 1 and qi = Bi = c\. Then, we divide two successive terms 
of (6.16) (so that the products of pi simplify), which gives 


(6.18) 


c k -iB k = c kPk B k _. 2 . 


This resembles, apart from the factors c k - 1 and c k , the Eq. (6.11). We therefore 
subtract from (6.18) Eq. (6.1 1), once multiplied by c k - 1 , once by c k , and obtain 

c k -iq k B k -i = (c fe - c k -i)p k B k _ 2 
(c k ~i - c k )B k = —c k q k B k _i. 

In the first formula we replace k by k + 1 and then divide the two expressions. 
This eliminates the B k s and gives 

Cfcgfc+l _ (cfc+l — Cfc)pfc+1 


(6.19) 




CkQk 
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The pi, qi are, of course, not uniquely defined. Since we want them to be integers, 
a natural choice that satisfies (6.19) is 


for k > 1. Thus, we have the following formula of Euler (1748, §369): 

( 6 . 21 ) 


When applied to two well-known series (see Sects. 1.3 and 1.4), this formula gives 


The second continued fraction is the one found by Lord Brouncker, obtained here 
from Leibniz’s series. 

Similarily, we prove (Euler 1748, §370) 

(6.24) 


whence, for example, 


(6.25) 






76 I. Introduction to Analysis of the Infinite 


Irrationality 


I have good reason to doubt that the present article will be read, or even 
understood, by those who should profit most by it, namely those who spend 
time and efforts in trying to square the circle. There will always be enough 
such persons . . . who understand very little of geometry . . . 

(Lambert 1770a) 


One of the great unsolved problems of classical analysis was the quadrature of 
the circle (i.e., the construction of 7r) by ruler and compass. Lambert was one of 
the first to believe that this construction, which challenged mathematicians for 
2000 years, was impossible. A first hint toward this result would be the faet that 
7T is irrational. We are therefore interested in a theorem that States that an infinite 
continued fraction represents an irrational number. 

First difficulty. It can happen that a continued fraction represents no number at all. 
To see this, we start from the series 


(6.26) 


2 _ 3 4 _ 5 6 _ 7 

1 2 + 3 4 + 5 6 + 


Since its terms approach ±1, it clearly does not converge. To obtain a correspond- 
ing continued fraction, we put c* = k/(k+ 1) (see (6. 17)) and obtain from (6. 19), 
after simplification, 

'" "- = k 3 (k + 2). 


Pk+l 


Qk+1 ■ Qk 


With Pk+i = k 3 (k + 2) and qk = 1 we have integer coefficients and see that the 
convergents of the continued fraction 


do not tend to a real number. 


Second difficulty. There are infinite continued fractions that represent a rational 
number. For example, we have 2 = 1 + 2/2 and obtain, by inserting 2 repeatedly. 


(6.28) 


2 = 1 



which is rational. 


(6.4) Theorem. If the pj and q-j are integers and iffrom a certain index j > jo 
onward 

(6.29) 0 < Pj < qj, 

then the continued fraction (6.7) tends to a number a that is irrational. 
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Proof. Without loss of generality we may assume that 0 < Pj < qj is satisfied for 
all j . Otherwise, we consider the continued fraction starting with pj 0 / qj 0 . Conver- 
gence of this continued fraction and its irrationality are equivalent to convergence 
and irrationality of the original one. 

The assumption that 0 < Pj < qj guarantees that the convergents of the 
continued fraction tend to a real number. This is a consequence of the “Leibniz 
criterion” and will be discussed in Sect. III.2. 

Following an idea of Legendre (1794, Eléments de Géométrie, Note IV), we 
now write the continued fraction (6.7) without qo as 

(6.30) a = Pl with 0 = — . 

qi + P P3 

92 H ; 

93 + • • • 

Since q\ > p\ and 0 > 0 we have a < 1. Suppose now that a = B/A is rational 
with 0 < B < A. A simple reformulation of (6.30) yields 
= pi - gia = Api - Bqi 

P a B 

so that 0 is expressed as a rational number with denominator smaller than that 
of a. If we repeat the same reasoning with 0 = JJ 2 / (92 + 7) and so on, we find 
smaller and smaller denominators that are all integers. This is not possible an 
infinite number of times. □ 


Negative pj. The conclusion of Theorem 6.4 is also valid, if (6.29) is replaced by 
(6.29’) 2 \pj | < Qj - 1. 

This is seen by repeated application of the identity (valid for pj < 0) 


Qj - 1 


Pj 

Qj + P 


(Qj - 1 


1 ) + 


i + w 

Qj ~ \Pj I +P 


which, under the assumption (6.29’), transforms the continued fraction into an- 
other one satisfying (6.29). 


(6.5) Theorem (Lambert 1768, 1770a, Legendre 1794). For each rational x (x 
0 ) the value tan x is irrational. 

Proof. Suppose that x = m/n is rational and insert this into (6.6): 


(6.31) 



78 I. Introduction to Analysis of the Infinite 


On the right we have a continued fraction with integer coefficients. Since the fac- 
tors 1, 3, 5, 7, 9, . . . approach infinity, condition(6.29’)is, for all m and n, satisfied 
beyond a certain index io. □ 

The same result is true for the arctan function; indeed, for y rational, x = 
arctan y must be irrational, otherwise y = tan x would be irrational by Theo- 
rem 6.5. In particular, 7r = 4 arctan 1 must be irrational. 

The proof of the analogous result for the hyperbolic tangent tanh x = 
( e x - e~ x )/(e x + e~ x ) = ( e 2x - l)/(e 2æ + 1) is even easier, since all mi- 
nus signs in (6.31) become plus signs. Inverting the last formula, we have e x = 
(1 +tanh(a;/2))/(l — tanh(æ/2)), and still obtain the irrationality of e x and In x 
for rational and x / 1, respectively. 

Exercises 

6.1 Show that with the use of matrix notation, the numerators and denominators 
Ak and Bk of the convergents (6.8) can be expressed in the following form: 

f A k A k -i\ = fq 0 lUfi 1 \ f Q 2 l\ f qk- 1 lUft 
\B k Bk~i ) \1 0 J \Pi 0 J \P2 0J"\Pk-i 0 ) \p k 0/ 

6.2 Compute numerically the regular continued fractions for the numbers 

V2, Vs, Vs, Ve, Vi, V2, Vs, VI, Ve, Ve, Vi 

and discover a significant difference between the square and the cube roots. 

6.3 Show that 



are solutions of a second-degree equation. Compute their values. 

6.4 The length of an astronomical year is (Euler 1748, §382) 

365 days 5 hours 4&'bb" . 

Compute the development of 5 hours (measured in days) into a reg- 

ular continued fraction and compute the corresponding convergents. Don’t 
forget to give your valuable advice to Pope Gregory XIII for the reform of 
his calendar. 

6.5 Give a detailed proof of Eq. (6.24). 
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Prove formula (6.6). 

Hint (Legendre 1794). Define 

, . _ a a 2 a 3 

^ Z) ~ + 1^ + 1 • 2 • z(z + 1) + 1 • 2 • 3 • z(z + l)(,z + 2) + ' ‘ ' 

and show that ip(z) — ip(z + 1) = ^ ^ <p(z + 2) . Next, define 

(6.33) ip{z) = a V( z + 1) suchthat V>(X) = , , lV 

Iterating (6.33) leads to a continued fraction. Finally, put a = x 2 /4 so that 
<p(l/2) = cosh x and xtp(3/2) = sinh x, and replace x by ix. We note that 
these formulas are related to continued fractions for hypergeometric func- 
tions (Gauss, Heine, see Perron 1913, p. 313, 353). 


J. Wallis 1616-1703 J.H. Lambert 1728-1777 

With permissions of Georg Olms Verlag Hildesheim and Univ. Bibi. Basel 



II 

Differential and Integral Calculus 



The extent of this calculus is immense: it applies to curves both mechanical 
and geometrical; radical signs cause it no difficulty, and even are often con- 
venient; it extends to as many variables as one wishes; the comparison of 
infinitely small quantities of all sorts is easy. And it gives rise to an infinity 
of surprising discoveries concerning curved or straight tangents, questions 
De maximis & minimis , inflexion points and cusps of curves, envelopes, 
caustics from reflexion or refraction, &c. as we shall see in this work. 
(Marquis de L’Hospital 1696, Introduction to Analyse des infiniment petits) 

This chapter introduces the differential and integral calculus, the greatest inven- 
tions of all time in mathematics. We explain the ideas of Leibniz, the Bernoullis, 
and Euler. A rigorous treatment in the spirit of the 19th century will be the subject 
of Sections IH.5 and III.6. 

As we see in the above illustration, this calculus sheds light on the obscure 
machinery of scientific research. 
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II. 1 The Derivative 


And I dåre say that this is not only the most useful and most general prob- 
lem in geometry that I know, but even that I ever desired to know. 

(Descartes 1637, p. 342, Engl. transi. p. 95) 
Isaac Newton was not a pleasant man. His relations with other academics 
were notorious, with most of his later life spent embroiled in heated dis- 
putes . . . A serious dispute arose with the German philosopher Gottfried 
Leibniz. Both Leibniz and Newton had independently developed a branch 
of mathematics called calculus, which underlies most of modem physics 
. . . Following the death of Leibniz, Newton is reported to have declared 
that he had taken great satisfaction in ‘breaking Leibniz’s heart’. 

(Hawking 1988, A brief history of time, Bantam Editors, New York) 
What contempt for the non-English! We have found these methods, without 
any help from the English. 

(Joh. Bernoulli 1735, Opera, vol. IV, p. 170) 
What you report about Bernard Niewentijt is just small beer. Who could 
refrain from laughing at his ridiculous hair-splitting about our calculus, as 
if he were blind to its advantages. 

(Letter of Joh. Bernoulli, quoted from Parmentier 1989, p. 316). 
We shall call the function fx a primitive function of the functions fx, f"x, 
&c. which derive from it, and we shall call these latter the derived functions 
of the first one. (Lagrange 1797) 

Problem. Let y = f(x) be a given curve. At each point x we wish to know the 

slope of the curve, the tangent or the normal to the curve. 

Motivations. 

- Calculation of the angles under which two curves intersect (Descartes); 

- construction of telescopes (Galilei), of clocks (Huygens 1673); 

- search for the maxima, minima of a function (Fermat 1638); 

- velocity and acceleration of a movement (Galilei 1638, Newton 1686); and 

- astronomy, verification of the Law of Gravitation (Kepler, Newton). 


The Derivative 

The Linear Function y = ax + b. In ad- 
dition to the fixed value x, we consider the 
perturbed value x + Ax. The correspond- 
ing t/-values ar ey = ax+b and y + Ay = 
a(x + Ax) + b, hence Ay = aAx. The 
slope of the line, defined by , is equal 
to a. Fig. 1.1 shows functions y = ax + 1 
for different values of a. 



0 x Ax 


l X \ 

a = 2 a = 1 a = 1/2 a = 0 a =-1/2 a =-l a =- 2 
FIGURE 1.1. Slopes in dependence of a 
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The Parabola y = x 2 . If x increases by Ax, then y increases to y + Ay = 
(. x + Ax) 2 = x 2 + 2xAx + (Ax) 2 so that (see Fig. 1.2a) 

(1.1) Ay = 2xAx + (Ax) 2 . 

Therefore, the slope of the line connecting (x, y) with (x + Ax, y + Ay) is equal 
to 2x + Ax. If Ax tends to zero, this slope will approach that of the tangent to the 
narabola. 



FIGURE 1 ,2a. Tangent to parabola FTGURE 1 ,2b. Tangent to parabola (Draw- 

ing of Joh. Bernoulli 1691/92) 1 

Leibniz (1684) imagines that Ax and Ay become “infinitely small” (“tangentem 
invenire, esse rectam ducere, quæ duo curvæ puncta distantiam infinite parvam 
habentia, jungat, . . .”) and denotes them by dx and dy. Then we neglect the term 
(dx) 2 , which is “infinitely smaller” than 2xdx, and obtain, instead of (1.1), 

(1.1') dy = 2xdx or ^ = 2a;. 

dx 

Newton (1671, pub. 1736, p. 20) considers his variables v, x, y, z “as gradually and 
indefinitely increasing, . . . And the velocities by which every Fluent is increased 
by its general motion, (which I may call Fluxions, . . .) I shall represent by the 
same Letters pointed thus v, x, y, z”. Their values are obtained by “rejecting the 
Terms ... as being equal to nothing”. Newton categorically refused the publication 
(“Pray let none of my mathematical papers be printed w^out my special licence”). 
Jac. and Joh. Bernoulli re-invent the differential calculus a third time, based on 
Leibniz’s obscure publication from 1684 (“une énigme plutot qu’une explica- 
tion”). Joh. Bernoulli (1691/92) then gave private lessons on the new calculus to 
the very noble Marquis de L’ Hospital. For him, infinitely small quantities are just 
quantities that can be added to finite quantities without altering their values and 

1 Reproduced with permission of Univ. Bibi. Basel. 
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curves are polygons with infinitely short sides. Furthermore, this greatest of all 
teachers (besides his numerous sons and nephews and de L’Hospital, he also in- 
troduced Euler to mathematics) held the opinion that too many explanations on 
the infinitely small would rather trouble the understanding of those who are not 
“accoutumés å de longues explications”. 

B. Nieventijt gives in 1694 a first criticism of the infinitely small (see the letter of 
Joh. Bernoulli quoted above), followed by a “Responsio” of Leibniz (in the July 
1695 issue of the journal Acta Eruditorum). 

Marquis de L’Hospital (1696) writes the famous book Analyse des infiniment pe- 
tits (see Fig. 1.3), which leads to the definitive breakthrough of the new calculus, 
even in France, where science was governed for many decades by the “Cartesians” 
(abbé Catelan, Papin, Rolle, . . . ). 



FIGURE 1.3. Drawing from de L’Hospital (1696), Analyse des infiniment petits 2 

Bishop Berkeley published the polemic article The Analyst in 1734 against the 
infinitely small (see the quotation in Sect. II.2 and Struik 1969, p. 333). 

Maclaurin (1742, Treatise of Fluxions, vol. II, p.420): “. . . investigate the ratio 
which is the limit ...” 

Euler (1755, Institutiones Calculi Dijferentialis) starts with two long chapters De 
dijferentiis finitis and De usu dijferentiarum in doctrina serierum, followed by 
six pages in latin on the infinite, before daring to write “denotet dx quantitatem 
infinite parvam” {dx = 0 and adx = 0), but requires that “ratio geometrica 
= j erit finita”. He favors Leibniz’s notation against Newton’s by saying 

that “. . . incommode hoc modo y repraesantur, cum nostro signandi modo d 10 y 
facillime comprehendatur”. 

D’Alembert (1754, Encyclopédie) introduces a clear notion of the limit (“This 
limit is the value which the ratio z/n approaches more and more . . . Nothing is 
clearer than this idea; . . .”). 

Lagrange (1797) rejects the infinitely small straightaway and tries to base anal- 
ysis on power series (“One knows the difficulties created by the assumption of 
infinitely small quantities, upon which Leibniz constructs his Calculus.”) He in- 
troduces the name derivative and uses for dy /dx the notation (see quotation) 

2 Reproduced with permission of Bibi. Publ. Univ. Genéve. 



84 II. Differential and Integral Calculus 


(1.2) y' or f{x). 

Cauchy (1823) condemns the Taylor series (counterexample y = e~ x ! x , see 
Sect. III.7 below) and reestablishes the infinitely small as a limit. 

Bolzano (1817) and Weierstrass (1861) bring the notion of limit to perfection with 
e and S (see Chap. III). 

F. Klein (1908) defends the educational value of the infinitely small (“The force of 
conviction inherent in such naive guiding reflections is, of course, different for dif- 
ferent individuals. Many — and I include myself here — find them very satisfying. 
Others, again, who are gifted only on the purely logical side, find them thoroughly 
meaningless ... In this connection, I should like to commend the Leibniz notation 
. . .”) 


Differentiation Rules 


His positis calculi regulae erunt tales: 

(Leibniz 1684) 

Sums and Constant Factors. Let y(x) = a ■ u(x) + b ■ v(x), where a and b 
are constant factors. Setting y + Ay = y(x + Ax), u + Au = u(x + Ax), 
v + Av = v{x + Ax), we have 

Ay = a ■ Au + b ■ Av 

and we get the differentiation rule 


(1.3) 


y = au + bv 


dy_ 

dx 


du + b d v 
dx dx 


y' = au' + bv' . 


Products. For the product of two functions y(x) = u(x) ■ v(x) we have 

y + Ay = u(x + Ax) ■ v(x + Ax) 

= (u + Au) ■ ( v + Av) = uv + uAv + v Au + Au Av, 

which leads to dy = udv + v du “because du dv is an infinitely small quantity 
when compared to the other terms udv &v du” (de L’Hospital 1696, p. 4) or 




dy 

dv du . . . 

y = u- v 

** 

dx 

= u — — hv— or y = uv + uv . 
dx dx 


Examples. We write x 3 as a product y = x 3 = x 2 ■ x and the above formula yields 
y' = x 1 ■ 1 + x ■ 2x = 3a; 2 . Similarly, for the product y = x 4 = x 3 ■ x we get 
y' = x 3 ■ 1+ x ■ 3x 2 = Ax 3 . By induction, we see in this way that for any positive 
integer n 
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(1.5) y = x n 


y' = n ■ x n 


Quotients. For the quotient y(x) = u{x)/v(x) of two functions we have 


y + Ay = 


u + Au 
v + Av ' 


Subtracting y on each side and using the geometric series for ( 1 + Av /v) 1 yields 
for v ^ 0 

A _ u + Au u _ vAu — uAv _ vAu — uAv ( ^ Av | (Av) 2 t \ 
V= v + Av ~v = v 2 + vAv ^ V 1_ V + _ 

Therefore, we have for v ^ 0 


(1.6) 



Example. The function y = x n = l/x n is the quotient of u 1 and v = x n . By 
applying (1.6) we get 



This is Eq. (1.5) for negative n. 



Inverse Functions. Let y = f(x) be a given function and x = g(y) its inverse. 
Since the graphs are reflected in the 45° axis (Fig. 1.4), we have 


Ay 

Ax 


nr 



for ± o. 

dy 


(1.7) 


and 
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Example. y = ir 1 / 2 is the inverse function of x = y 2 . Therefore, 


v = x 1 / 2 ^ _ 1 _ 1 _ 1 _ 1 „— 1/2 
dx ^ 2 y 2y/x 2 

and Eq. (1.5) appears to be true for rational n. 

Exponential Function. For the exponential function y = e x (Sect. 1.2) we have 

y + Ay = e x+Ax = e x ■ e Ax and Ay = e x (e Ax — 1). 

Using the series e Ax = 1 + Ax + {Ax) 2 / 2! + . . . (Theorem 1.2.3) we therefore 
obtain 


(1.8) y = e x => y = e x . 

The exponential function is its own derivative. 

Logarithms. There are several ways to compute the derivative of y = ln x. 

a) It is the inverse function of x = e v . By (1.7), 

(L9) " =lni - I = = ? 

b) We can also compute Ay from y + Ay = ln(æ + Ax) and obtain 

x + Z \ X / X \ 

Ay = ln(x + Ax) — ln(x) = ln = ln^l H — — J . 

( /\ t » \ x 1 / X \ ^ 

1 H J = — - — - y J + . . . (see (1.3.13)) we again 

obtain (1.9). 

Trigonometric Functions. Consider first y = sin x. Using Eq. (1.4.3) we get 

y + Ay = sin(a; + Ax) = sin x cos Ax + cos x sin Ax. 

With the series expansions for sin Ax and cos Ax: (see (1.4.16) and (1.4.17)) we 
obtain 

. ( {Ax) 2 \ f {Ax) 3 \ 

Ay = sina;^ +•••)+ cos x[Ax + . . .J 

and consequently 

(1.10) t/ = sintr => y' = cosx. 


Similarly, 

(1.11) y = cos x s# y' = ~ sintr. 

For y = tan x = sin tr/ costr we use (1.6) and obtain 
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( 1 . 12 ) 


dy cos 2 x + sin 2 x 2 1 

~r = 5 = 1 + tan x = — 5 —. 

dx cos^ x cos^ x 


Inverse Trigonometric Functions. As a consequence of (1.7) and the above for- 
mulas for the derivatives of the trigonometric functions, we have 


(1.13) 

y = arctanx 


dy _ 

1 

1 

1 


dx 

dx/dy 

1 + tan 2 y 

1 + x 2 ’ 

(1.14) 

y = arcsinx 


dy _ 

1 

1 

1 


dx 

cos y 

y/l - sin 2 y 

vT^’ 

(1.15) 

y = arccosx 


dy _ 

1 

-1 

-1 


dx 

— siny 

y/l - cos 2 y 

y/1^' 


Composite Functions. Consider a function y = h(x) = f(g(x)) and let z = 
g{x). For the incremented values we have 2 + Az = g(x + Ax) and y + Ay = 
h(x + Ax) = f(z + Az). From the trivial identity 

Ay Ay Az 
Ax Az Ax 

it follows that 


(1.16) 


åy_ 

dx 


Jz'% ° r h '^ = ‘ 9 '^' 


In order to differentiate a composite function, one has to multiply the derivatives 
of the functions / and g. 



Example. The function y = sin(2x) is composed as y = sin 2 and z = 2x (see 
Fig. 1.5). By (1.16) its derivative is y' = cos 2 -2 = 2 cos(2x). 

Relying on these rules, the computation of the derivative of any function 
composed of elementary functions (Descartes’ great dream, see quotation at the 
beginning of this section) has become a banality. For instance, 
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x x-lna 0 = X-lna) 

dy 

dy 

dz 

y = a=e => 

dx 

~ dz 

dx 

a o-lni ( z = a - lnx) 

dy 

dy 

dz 

y = x=e =. 

dx 

~ dz 

dx 


Thus, we have Eq. (1.5) for any real number n. 


gX-lna . J na _ l na . a x ^ 



Parametric Representation and Implicit Equations 

We take as an example a curve of venerable age: the conchoid of Nicomedes (200 
B.C.). For two given constants a and b the conchoid is defined as follows: on any 
ray through the origin G the distance of a point A on the conchoid and the point F 
on a horizontal line of height a is of constant length b (see Fig. 1.6). 



FIGURE 1.6. The conchoid of Nicomedes 


The similarity of triangles FAB and FGF gives the relation 


y-a b 



which leads to 

(1.17) (y - o) 2 (æ 2 +y 2 ) = b 2 y 2 . 

If we wanted to express y as a function of x from this equation, we would have to 
solve a polynomial equation of degree 4 for each x. We should try instead to work 
with the implicit equation (1.17) itself. 

Another possibility is to denote the angle FGF by <p and obtain 

x = a tan ip + b sin ip 

(1.18) 

y = a + b cos ip. 

When ip varies from —n/2 to n/2, the expressions (1.18) then form a parametric 
representation of our curve. Such parametric representations are not unique. For 
example, we may also use the distance GF as parameter t (see Fig. 1.6). Then we 
obtain 
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(1.19) 


x = (b/t + l)Vt 2 -a 2 
y = ( b/t + 1 )a. 


This represents the right half of the curve when t varies from a to oo. 

We now consider the problem of computing the tangent to the conchoid at a 
given point A (this is “Aufgabe 7” of Joh. Bernoulli 1691/92). 

Differentiation of the Parametric Equation. We consider y in the second equa- 
tion of (1.18) or (1.19) as a function of the parameter, and we interpret the param- 
eter as the inverse function of x of the first equation. Then we have by (1.16) and 

(1.7), 

(i 20) — = — ■ — = — /— or ^y_ = ^y_ 

dx dip dx dip/ dip dx dt / dt 

Thank you, Leibniz, once again, for your notation. Differentiating the equations 
(1.19) and dividing the derivatives we obtain for the conchoid 


( 1 . 21 ) 


dy _ —aby/t 2 — a 2 
dx t 3 + a 2 b 


This formula allows a nice interpretation (Joh. Bernoulli 1691/92): denote by M 
the point such that triangles LGF and GMA are similar. Then, the tangent in A is 
parallel to the line connecting M and F (see Fig 1.6). 


Implicit Differentiation. This method, already used by Leibniz (1684), consists 
of using the above rules to differentiate directly an implicit equation defining the 
function y[x) (in our example the equation (1.17)). This gives 

2 (y - a) dy (x 2 + y 2 ) + (y - a) 2 (2x dx + 2y dy) = 2 b 2 ydy 


and after division by 2dx, 

n 221 ^ Z a ) 2 

dx (y — a)(x 2 + y 2 ) + (y — a) 2 y — b 2 y ' 

This implicit differentiation will be discussed more rigorously in Sect. IV. 3. 


Exercises 

1 . 1 Extend the differentiation rule ( 1 .4) to three factors 

y = u-v-w => y' = u' -v-w + u-v' -w + u-v-w'. 


1.2 


Compute the derivative dy /dx of 

5 sin(3x + by/x 2 + e 2æ ) • tan( 
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1 .3 (An example of Euler 1755, § 192). Show that if 


y = e e then y' = e e • e e • e x . 

1 .4 Compute the derivative of the cis- 
soid of Diodes (about 180 B.C.). 

This curve, used by Diocles for 
solving the Delian problem of du- 
plicating the cube, is created by 
the circle MCE as the set of points 
of intersection of the lines DM 
and BF, where the ares BC and 
CD are equal. Show that the tan- 
gent at A is parallel to the line EH, 
where H is such that EF and GH 
are parallel. 

1.5 Compute the derivative of the circle defined by x 2 + y 2 = r 2 by implicit 
differentiation as well as by solving for y followed by explicit differentiation. 

1.6 (Leibniz 1684). Compute the derivative of the funetion y(x) defined by 



x (o + bx) ■ (c — xx) 
y (ex + fxx) 2 


where a, b, c, e, /, g, h, l, and m are constants. This equation does not rep- 
resent any ancient famous Babylonian or Egyptian curve and has no other 
particular interest either. It was just chosen by Leibniz as a horribly compli- 
cated expression in order to demonstrate the power of his calculus. 
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II.2 Higher Derivatives and Taylor Series 

But the velocities of the velocities, the second, third, fourth, and fifth ve- 
locities, &c., exceed, if I mistake not, all human understanding. The further 
the mind analyseth and pursueth these fugitive ideas the more it is lost and 
bewildered; . . . 

(Bishop Berkeley 1734, TheAnalyst, see Struik 1969, Source Book, p. 335) 
. . . our modern analysts are not content to consider only the differences 
of finite quantities: they also consider the differences of those differences, 
and the differences of the differences of the first differences. And so on 
ad infinitum. That is, they consider quantities infinitely less than the least 
discemible quantity; and others infinitely less than those infinitely small 
ones; and still others infinitely less than the preceding infinitesimals, and 
so without end or limit . . . Now to conceive a quantity infinitely small . . . 
is, I confess, above my capacity. But to conceive a part of such infinitely 
small quantity that shall be still infinitely less than it, and consequently 
though multiplied infinitely shall never equal the minutest finite quantity, 
is, I suspect, an infinite difficulty to any man whatsoever; . . . 

(Bishop Berkeley 1734, The Analyst) 


The Second Derivative 

We have seen in Sect. II. 1 that for a given function y = f(x) the derivative f'(x) 
is the slope of the tangent to the curve y = f(x). Therefore, if /'( x) > 0 for 
a < x < b, the function is increasing on that interval; if f'(x) < 0 for a < x <b, 
it is decreasing. Points at which f'(x) = 0 are called stationary points. 



FIGURE 2.1a. Geometrical meaning of the FIGURE 2. Ih. A drawing of Joh. Bemoulli 

second derivative (1691/92) 1 


Newton (1665) and Joh. Bernoulli (1691/92) were the first to study the ge- 
ometric meaning of the second derivative of /. We differentiate y’ = f'(x) to 
obtain y" = f"{x). If f"(x) > 0 for a < x < b, then f'(x) will be increasing, 
i.e., for two points xq < x\ we will have f'(x o) < f'(x i). This means that the 
curve is steeper at x\ than at xq and therefore is crooked upward (see Fig. 2.1a, 
left). We then say that the function f(x) is convex downward. 

Similarly, if f"(x) < 0 for a < x < b, the function f(x) is convex upward 
(see Fig. 2. la, right). Points with f"(x o) = 0, where the second derivative changes 
sign, are called inflection points. Fig. 2.1b reproduces a drawing of Joh. Bernoulli 
explaining these facts. 

1 Reproduced with permission of Univ. Bibi. Basel. 
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Problems “de maximis & minimis”. 

I just wish him to know that our questions de maximis et minimis and de 
tangentibus linearum cun’arum were perfect eight or ten years ago and that 
several persons who have seen them in the last five or six years can bear 
witness to this. 

(Letter from Fermat to Descartes, June 1638; Oeuvres, tome 2, p. 154-162) 
When a Quantity is the greatest or the least that it can be, at that moment 
it neither flows backward or forward. For if it flows forward, or increases, 
that proves it was less, and will presently be greater than it is. . . . Wherefore 
find its Fluxion, by Prob. 1 and suppose it to be nothing. 

(Newton 1671, engl. pub. 1736, p. 44) 

The problem of finding maximal or minimal values was one of the very first moti- 
vations for the differential calculus (Fermat 1638) and was cultivated by Lagrange 
throughout his life (see Lagrange 1759). 

At a maximal or minimal value of a function f(x), this function can neither 
increase nor decrease. Hence we must have f'(x o) = 0 (stationary point). It will 
be a (local) maximum if the sign of f'(x) changes from + to — (this is the case 

if f"{x o) < 0) and a (local) minimum if it changes from — to + (this happens if 

f"(x o) > 0). We summarize this as 

f'(x o) = 0 and f"(x 0 ) >0 => x$ is a local minimum, 

f'(xo) = 0 and f"(xo) < 0 => x 0 is a local maximum. 


These facts “sequentibus exemplis illustrabimus”: 
Example 1. We choose 


y = X 3 -X 2 - 3®, 

(2.2) y' = 3a; 2 — 2x — 3, 

y" = 6a; — 2. 

The function can be seen to increase where 
y' > 0, i.e., for x < (1 - \/l0)/3 and for 
x > (1 + y/10) /3. It is convex downward for 
x > 1/3 and convex upward for x < 1/3. The 
point x = 1/3 is an inflection point. The point 
x = (1 — x/l0)/3 is a local (but not global) 
maximum, the point x = (1 + x/lO) /3 is a 
local minimum. 



y 


Example 2. We consider the function (see Euler 1755, Pars Posterior, §265) 


-6a; + 2a: 3 
^ (1 + * 2 ) 3 ’ 


which, together with its first and second derivative, is plotted in Fig. 2.2. The func- 
tion y(x) possesses a (global) minimum for x = — 1 , a (global) maximum for 
x = 1, and inflection points at x = 0 and x = ±\/3- It is convex downward on 
the intervals — \/3 < x < 0 and s/3 < x < oo and convex upward elsewhere. 
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FIGURE 2.2. Maxima, minima, inflection points of Euler’s example 


Fermat’s Principle. 



FIGURE 2.3. Drawing by Joh. Bernoulli FIGURE 2.4. Fermat’s principle 

1691/92 2 


Fermat wishes to explain the law of Snellius for the refraction of light between 
two media in which the velocities are v\ and v- 2 , respectively. Let two points A, B 
(see Fig. 2.4) be given. Find angles a.\ and et 2 such that light travels from A to B 
in minimal time or with minimal resistance. This means, find x such that 


(2.4) 



vi 


^ + {1- X y 

V-2 


min ! 


Fermat himself found the problem too difficult for an analytical treatment (“I ad- 
mit that this problem is not one of the easiest”). The computations were then 
proudly performed by Leibniz (1684) “in tribus lineis”. The derivative of T as a 
function of x is 


Reproduced with permission of Univ. Bibi. Basel. 
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rpi _ 1 2x ^ -2(£-x) J_ 

vi 2 Va 2 + x 2 2 ^b 2 + {l-x) 2 v 2 ‘ 

Observing that sin a\ = x/ \J a 2 + x 2 and sin cr 2 = (£ — x)/ \Jh 2 + (£ — x) 2 , we 
see that this derivative vanishes whenever 
(2 sinai = sina 2 

ni V2 

(law of Snellius). The computation of T", 

T" 1 ° 2 1 62 

“ Vi (o 2 + X 2 ) 3 / 2 + V2 (b 2 + {£- x) 2 ) 3 / 2 > ’ 

shows that our result is really a minimum. 


De Conversione Functionum in Series 

Taylor’s Approach. 

We have here, in faet, a passage to the limit ofunexampled audaeity. 

(F. Klein 1908, Engl. ed., p. 233) 

We consider (Taylor 1715) for a funetion /(x) the points xo, xi = xo + Ax, x 2 = 
x 0 + 2Z\x, ... and the funetion values y 0 = /(x 0 ), yi = /(xi), y 2 = /(x 2 ), 



Then we compute the interpolation polynomial passing through these points (see 
Fig. 2.5 and Theorem 1.1.2; for the latter we define x = xo + tAx, t = x ^.° ) 

(2 . 6) pW = w + £^^ + (— ;)<»-«■ ) ^, 

or with more such terms for higher degrees. If we let Ax —> 0, xi — > xo, x 2 — > xo 
(or, as we said: if we take Ax infinitely small), the quotient Ayo /Ax in the second 
term tends to f'(x o). Further, the product (x — xo)(x — xi), which appears in the 
third term, will tend to (x — xo) 2 . It was then postulated by Taylor that the second 
differences (divided by Ax 2 ) will tend to the second derivative (see Exercises 2.5 
and III.6.4); in general, 


A k y 0 

Ax k 


d k y I 

dx k lo 


f {k) {x o). 


(2.7) 
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If we consider in the interpolation polynomial (2.6) more and more terms, and, at 
the same time, take the limit as Ax — ► 0, we obtain the famous formula 
( 2 . 8 ) 

f(x) = f(x o) + (x - Xø)f(x o) + 2 j r °) f"( Xo ) + l£_pL/"'( Xo ) + . . . . 

All the series of the first chapter are special cases of this “series universalissima”. 
For example, the function f(x) = ln(l + x) has the derivatives 

/(O) = 0, /'( 0) = 1, /W(0) = (-l) fc - 1 (fc-l)! 

and we obtain 



Remarks. Formula (2.8) was believed to be generally true for more than a century. 
Cauchy then found an example of a function for which the series (2.8) converges, 
but not to f(x) (see Sect. III.7). There are also examples of functions for which 
the series (2.8) does not converge at all for x ^ xo (see Exercise III.7.6). A more 
satisfactory proof of (2.8) (due to Joh. Bernoulli) uses integral calculus and will 
be given in Sect. II.4. 

Maclaurin’s Approach (Maclaurin 1742, p. 223-224, art. 255). For the function 
y = f(x) and a given point xq we look for a series (or polynomial) 

(2.9) p(x) = po + (x- xo)qo + {x — xo) 2 ro + (x — xo) 3 so + . . . , 
for which 

(2.10) P W (xo) = f {i \x 0 ) i = 0,1,2,..., 

i.e., both functions have the same derivatives up to a certain order at x = xq. Set- 
ting x = xo in (2.9) yields po = p(x o) = f(x o) by (2.10). We then differentiate 
(2.9), again set x = xo, and obtain qo = p'(x o) = f'(x o). Further differentiations 
give 2!ro = f"{x o), 3!so = f"'{x o), and so on. Therefore, the series (2.9) is 
identical to that of (2.8). 

Partial sums of the series (2.8) are called Taylor polynomials. 

Example. For the function given in (2.2) we 
choose the point xo = 1 and have f(x o) = —3, 
f{x o) = -2, f'(x o) = 4, and f'"(x 0 ) = 6. 

Thus, the Taylor polynomials of degree 1, 2, 
and 3 become 

Pi(x) = — 3 — 2(a; — 1) = —2a; — 1, 

P2(x) = Pi(x) + x — l) 2 = 2a; 2 — 6a; + 1, 

Ps{x) = P2(x) + |(a; - l) 3 = x 3 - x 2 - 3x. 




96 II. Differential and Integral Calculus 


Newton’s Method for Roots of Equations. The Taylor polynomials are an ex- 
tremely useful tool for the approximate computation of roots. We consider the 
example treated by Newton (1671), 

(2.11) æ 3 - 2x- 5 = 0. 

Trying out a few values of the function f{x) = x 3 — 2x — 5, we find /(O) = —5, 
/( 1) = —6, /( 2) = —1, /( 3) = 16. Hence, there is a root close to xq = 2. The 
idea is now to replace the curve f(x) by its tangent line at the point xo , which is 
Pi(x) = — 1 + 10(x — 2). The root of pi(x) = 0, which is x = 2.1, is then an 
improved approximation to the root of (2. 1 1 ). We now choose xo = 2 . 1 and repeat 
the calculation. This gives p\{x) = 0.061 + 11.23(2; — 2.1) and x = 2.0945681 
as new approximation of the root of (2. 1 1). A further step yields x = 2.0945515, 
where all digits shown are correct (see in Fig. 2.6 a facsimile of the calculation 
done by Newton). 


>* — 27 — 5 =° 

+ 2,10000000 
—0,0054485; 

+ 2,09455147 = y 

2 + P =7 

0,1+ J=P 

+ 7 ! 
•4-25 
— 5 

+ 8 + I2j» + + p> 

-4 — 2p 

-s 

Summa 

+ V' 

+ 10 p 

— 1 + lop + 6p' 

+ 0,001 - 4 - 0,039 + o, 39 * + 9 ’ 

+ 0,06 + 1,2 + 6,0 

+ 1, + 10, 

Summa 

+ 0,061 1 1,239 + 39 * + V 

— o,oo 54 + r = j 

+ 11,2?? 

-t-o,oér 

+ 0,000 1 8 3 708 — o,oé 8041-4- 6 , 3 » 5 
— 0,060642 +11,23 
+ 0,061 

Summa 

+0,000541708+1 1,1 61 961+ 6,3r* 

—0,00004854 + * = r 




negleGo, 8 1 prodit <,jr*+ u,j«ij6r+ 0,000541708 = o fere, Gve 
(rejeQo r= = — 0,0000485} fere, ijuagi fcribo in 

negativa parte Qpotiemis. Denique negativam partern /Quotientis at 
Affirmativa fubducem habeo 2,09455147 Quotientem quefitam. 

FIGURE2.6. Newton’s calculation for x 3 — 2x — 5 = 0 3 

Use of the second degree polynomial (E. Halley 1694). We choose for the above 
example the point xo = 2.1 and use two terms of the Taylor polynomial. This 
gives 

0.061 + 11.23(2; - 2.1) + 6.3(2; - 2.1) 2 = 0, 

a quadratic equation inz = x — 2.1, which has two roots. We choose the one that 
is smaller in absolute value (i.e., for which x is closer to 2.1) and obtain 
„ _ -11.23 + VH.23 2 - 4 • 0.061 • 6.3 
“ 12.6 

3 Reproduced with permission of Bibi. Publ. Univ. Genéve. 
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hence, x = 2.0945515. Again, all digits shown are correct, obtained this time with 
only one iteration. 

Exercises 

2.1 (Euler 1755, §261). Study the functions 

y = x 4 — 8x 3 + 22a; 2 — 24a; +12, y = x 5 — 5x 4 + 5a; 3 + 1. 

Find maxima, minima, convex downward regions, inflection points. 

2.2 (Euler 1755, §272). The sequence of numbers 

\/l = 1, v / 2 = 1.4142, ^3 = 1.4422, = 1.4142, ^5 = 1.3797 ,.. . 

suggests that the function y = '{fx = x x l :,: possesses a maximum value close 
to x = 3. Where exactly? In which relation is this value with the minimum 
value of y = x x l 

2.3 (Joh. Bernoulli 1691/92). Find x such that 
the rectangle formed by the abscissa and 
the ordinate for a point on the circle y = 
fx — x 2 has maximal area. Verify the max- 
imality by computing the second derivative. 

2.4 (Euler 1755, §272). Find x such that a;sina: possesses a (local) maximum 
(you will find an equation that is hest solved by Newton’s or Halley ’s method; 
Euler gives the result x = 116° 14'2 1"20" , 35 ,,,, 47 ,,,,/ ; the correct value of the 
last digits is 32 ,,,, 38 / ""). 

2.5 Compute for the function y = x 3 the second difference 

A 2 y = (x + 2 Ae) 3 — 2(a; + Ax) 3 + x 3 . 

Show that this difference, divided by Ax 2 , tends, for Ax — > 0, to 6a;, the 
second derivative. 

2.6 Let f(x) = sin(a; 2 ). Compute f'{x), f"(x), f”'(x), f'"'{x), ... to obtain 
the series of Taylor 

m = /(o) + no)x + n o}^ + n o)^ + r( o) J + . . . . 

Is there a much better way of obtaining this result? 

2.7 Show that Newton’s method, applied to a; 2 — 2 = 0, is identical to (1.2.13), 
the Babylonian computation of \fl- However, formula (1.2.14) is different 
from Halley’ s method. Why? 

2.8 (Leibniz 1710). For a function y(x) = u(x) ■ v(x) show, by extending (1.4), 
that 

y" = u"v + 2 u'v' + uv", y'" = u"'v + 3 u"v' + 3u'v" + uv'" . 
Find a general rule for y^ . 
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II. 3 Envelopes and Curvature 


My Brother, Professor at Basle , has taken this opportunity to investigate 
several curves that Nature sets before our eyes every day . . . 

(Joh. Bemoulli 1692) 

I am quite convinced that there is hardly a geometer in the world who can 
be compared to you. (de L’Hospital 1695, letter to Joh. Bemoulli) 


Envelope of a Family of Straight Lines 

Inspired by a drawing of A. Durer (1525, p. 38, see 
Fig. 3.1, right), we consider a point (a, 0) moving 
on the tr-axis and the point (0, 13 — a) moving on 
the y-axis in opposite direction. If we connect these 
points by a straight line 

„ a— 13, , I3x 

(3.1) y = (x — a) = 13 + x — a 

o a 

we obtain an infinity of lines which are displayed in Fig. 3.1, and which create an 
interesting curve, called the envelope, which is tangent to each of these lines. The 
problem is to compute this curve. This kind of problems was extensively discussed 
between Leibniz (see Leibniz 1694a), Joh. Bemoulli and de L’Hospital. 





FIGURE3.1. Family of straight lines forming a parabola and a sketch by Diirer (1525) 1 


Idea. We fix the variable x to an arbitrary value, say, x = 4, for which the family 
(3.1) becomes y = 17 — o — 52/a. We then observe that this value first increases 
for increasing a (see Fig. 3.1; for o = 3, 4, 5, 6 we have y = —3.33, 0, 1.6, 2.33 
respectively). During this time the point (4, y ) approaches the envelope. The en- 
velope is finally reached precisely when this function attains its maximum value, 


Reproduced with permission of Verlag Dr. Alfons Uhi, Nordlingen. 
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whence where the derivative y' = — 1 + 52/a 2 = 0, i.e., for a = \/52. This value 
is y = 17 - 2\/52 = 2.58. 

The same idea works for any value of x: we have to compute the derivative 
of (3.1) with respect to a by considering x as a constant (“differentiare secundum 
o”). This is called the partial derivative with respect to a. At points of the envelope 
this derivative must vanish. Today we denote this as (see Sect. IV. 3 below, see also 
Jacobi 1827, Oeuvres, vol. 3, p. 65) 


(3.2) 


dy 

£=°- 


For Eq. (3.1) this becomes dy /da = — 1 + 13 x/a 2 and condition (3.2) gives 
a = fl3x. We obtain the envelope by inserting this into (3.1), 

(3.3) y = x — 2y/l3x + 13 


or 

(3.4) (y-x- 13) 2 = 52a;. 

This is the equation of a conic, which, in our case, turns out to be a parabola. 


The Caustic of a Circle 

Problem. Let x 2 +y 2 = 1 be a circle (Fig. 3.2) and suppose that parallel vertical 
rays are reflected by this circle. This yields a new family of straight lines which 
apparently produce an interesting envelope. Find the equation of this envelope. 



Joh. Bernoulli (1692) gives a solution “per vulgarem Geometriam Carte- 
sianam”; on the other hånd, in his “Lectiones” (Joh. Bernoulli 1691/92b, Lectio 
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This is, together with (3.7), a parametric representation of the caustic. If we want 
y expressed by x, we insert sin a = x 1 - 73 and obtain 



(3.8) 
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Envelope of Ballistic Curves 

Problem. A cannon shoots hullets with initial velocity uo = 1 at all elevations. 
We wish to find the envelope of all ballistic parabolas (Fig. 3.4). This question, al- 
ready considered by E. Torricelli (De motu projectorum 1644), was among the first 
problems which fascinated the young Joh. Bemoulli (see Briefwechsel, p. 1 1 1). 



FIGURE3.4a. Envelope of shooting para- FIGURE3.4b. The “Sun Fountain” from 
bolas 1721, in “Peterhof”, St. Petersburg 


Solution. Let a be the slope of the cannon. Then the movement of the bullet (under 
a gravitational acceleration of g = 1) is given by 


x(t) 


t 



y(t ) = - 


tf 
2 ' 


Eliminating the parameter t = xy/ 1 + a 2 , we get 

„ Q \ _ x\l +a 2 ) 


Differentiation of (3.9) with respect to a gives dy/da = x— ax 2 and the condition 
(3.2) leads to a = 1/x. Inserting this into (3.9), we obtain 


y = (l-x 2 )/2, 


so that the envelope is a parabola with the cannon at its focus. 


Curvature 


There are few Problems conceming Curves more elegant than this, or that 
give a greater Insight into their nature. 

(Newton 1671, Engl. pub. 1736, p. 59) 

Problem. For a given curve y = f(x) and a given point (a, f{a)) on this curve, 
we want to find the equation of a circle that approximates as well as possible 
the function f(x) in the neighborhood of a. This circle is then called the circle 
of curvature and its center is the center of curvature. The inverse of its radius is 
called the curvature of the curve at the point (a, /(«)). 
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Idea (Newton 1671). Let 

(3.10) y = f(a) - - (x - a) 


be the normal to the curve y = f{x) at the point x = a. If we increase a (“imagine 
the point D to move in the curve an infinitely little distance”), we find a second 
normal that intersects the first one at the center of curvature (Fig. 3.5). 

The situation is identical to that of the envelopes (see Fig. 3.1b). Thus, we 
compute 


%- na)+ ww {x ~ a) + W'y 


and conditon (3.2) yields for the center of curvature 
(3.12) 


(1 + 

/"(«) 


Vo-f(a) = 


xo~ a 

w 


(1 + (/W) 

/"(<*) 


For the radius r = ^ (ato — o) 2 + (t/o — f( a )) 2 and for the curvature k, we thus 
get 


(3.13) 


(l + (/'(a)) 2 ) 3/2 

\f"( a )\ 


and 


irwi 

(i + (/ , ( a )) 2 ) 3 / 2 



FIGURE3.5. Curvature, sketches by Newton 1671, ( Meth . Fluxionurrv, French transi. 1740) 2 

Example. For the parabola y = x 2 we get r = (1 + 4a 2 ) 3 / 2 /2, and the center of 
curvature is given by 


Reproduced with permission of Editions Albert Blanchard, Paris. 
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These formulas form a parametric representation of the geometric locus (xq ■ Vo) 
of the centers of curvature. It is called the evolute. In the situation of Eq. (3.14), 
the parameter (here a) can be eliminated and we obtain (see Fig. 3.6b) 



Fig. 3.6a illustrates the faet that the evolute is the envelope of the family of normals 
to the given curve. 



FIGURE3.6a. Evolute = envelope of the FIGURE3.6b. Parabola y = x 2 and its 
normals evolute 


Curvature of a Curve in Parametric Representation. Consider a curve given by 
y(t)) and suppose that close to the point (x(a) .y(a)) it can be represented 
as y = f{x). Then, we have by (1.20) 

f( x ) = ^- = dy / dt — 

dx dx/dt x'{tY 

and for the second derivative 

mi s _ \ = §_(y/W\ /te 

dx\dx) dt\x'{t))l dt x'(t) 3 

Inserted into Eqs. (3.12) and (3.13), we get 


(3.15) 

(3.16) 


xo — x(a) 

yo - y{a) 


y'{a)(x'{af + y'(a) 2 ) 
x'(a)y"(a) - x"(a)y'(a) 
x'(a)(x'(a) 2 + y'(a) 2 ) , 
x'(a)y"(a) - x"{a)y'(a)' 
(x'{a) 2 + y'{a) 2 f ,2 
| x'{a)y"{a)-x"{a)y'{a)\' 


(3.17) 




FIGURE3.7. Cycloid and its evolute 


mais aussy dans les 
la roulette premiere 
done la roulette A 
le diametre AE = < 
grandeur de AD, t 

FIGURE 3.8. A cycloid drawn by Joh. Bemoulli (1955, p.254, letter of Jan. 12, 1695 to de 
L’ Hospital) 3 

Example. The cycloid (trajectory of the valve of the wheel of a bike) is given by 
the parametric representation 

(3.18) x = t — sinf, y=l — cosf. 

Computing its derivatives, we obtain from Eqs. (3.15) through (3.17) that the evo- 
lute of the cycloid is given by 

(3.19) a;o = a + sina, t/o = — 1 + cosa. 

This is a cycloid again, in a different position. 

Involutes. We now start from a given evolute ABB (see Fig. 3.6b) and construct a 
new curve CC defined by the property that the arc length ABC is constant (imagine 
a string unwinding from the evolute). These new curves are called involutes. If one 
point of the involute coincides with the original funetion f(x), both curves will 
have the same curvature. It then follows (to be proved rigorously by the ideas of 
Sect. III.6) that both curves are identical. Hence, not only the evolute, but also the 
involute of the cycloid (with the correct choice of the arc length) is again a cycloid 
(Newton 1671, Prob. V, Nr. 34). Huygens (1673) used this property to construct the 
hest pendulum-clocks of his century, based on the faet that a pendulum following 
a cycloid is isochronous (see Fig. 7.8 of Sect. II.7). 

3 Reproduced with permission of Birkhaeuser Verlag, Basel. 
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Exercises 


3.1 A bar of length 1 glides along a vertical wall (see Fig. 3.9a). Find a formula 
for the created envelope. 

3.2 Find a formula for the envelope (see Fig. 3.9b) created by the family 



\ 

/ c) 

\ > 


- 2 wå 


///yWm. 

- 

///// m 

^ \ 


V 3 
\ 2 



i\n 

2 -1 ° 

i : 


FIGURE3.9. Evolutes and envelopes 


3.3 (Cauchy 1824). Find the envelope created by the family of parabolas 

y = b(x + b) 2 


with parameter b (see Fig. 3.9c). 

3.4 Compute for the function y = ln x the radius of curvature at the point a and 
determine a for which this radius is minimal (see Fig. 3.9d). It can be seen 
that the evolute has a stationary point (a cusp) at this minimal position. 

3.5 Compute the evolute of the ellipse (see Fig. 3.9e) 


y = fe sin i 


Determine the maximal and minimal curvature. 

Result. x = (a— — ^ cos 3 1 , y = (b— sin 3 1 . 
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3.6 Compute the radius of curvature of the catenary y = (e x + e~ x )/2. Show 
that this radius for a given point M on the curve is equal to the length of the 
normal MN (see Fig. 3.9f). 

3.7 One observes in Fig. 3.7 that a spoke of a rolling wheel creates an envelope 
that resembles a half-sized mini cycloid. This becomes more visible when 
the entire diameter is drawn (Fig. 3.10). Compute the envelope of this family 
of straight lines 

„ cos t 

y = l + {x-t)- 

sin i 



-1 0 1 2 3 4 5 6 7 

FIGURE3.10. Small cycloid as envelope 



Johann Bemoulli (1667-1748) 4 Marquis de Sainte-Mesme et du Montellier 

Compte d’Autremonts, Seigneur d’Ouques et autres lieux 


4 Reproduced with permission of Georg Olms Verlag, Hildesheim. 

5 Reproduced with permission of Birkhaeuser Verlag, Basel. 
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II.4 Integral Calculus 

. . . notam J pro summis, ut adhibetur nota d pro differentiis . . . 

(Letter of Leibniz to Joh. Bemoulli, March 8/18, 1696) 
. . . quod autem . . . vocabulum integralis etiamnum usurpaverim . . . 

(Letter of Joh. Bemoulli to Leihniz, April 7, 1696) 
And whereas M r Leibnits præfixes the letter f to the Ordinate of a curve 
to denote the Surnm of the Ordinates or area of the Curve, I did some years 
before represent the same thing hy inscribing the Ordinate in a square .... 
My symbols therefore . . . are the oldest in the kind. 

(Newton, letter to Keill, April 20, 1714) 

The integral calculus is, in faet, much older than the differential calculus, because 
the computation of areas, surfaces, and volumes occupied the greatest mathemati- 
cians since antiquity: Archimedes, Kepler, Cavalieri, Viviani, Fermat (see The- 
orem 1.3.2), Gregory St. Vincent, Guldin, Gregory, Barrow. The decisive break- 
through came when Newton, Leibniz, and Joh. Bemoulli discovered indepen- 
dently that integration is the inverse operation of differentiation, thus reducing 
all efforts of the above researchers to a couple of differentiation mies. The inte- 
gral sign is due to Leibniz (1686), the term “integral” is due to Joh. Bemoulli and 
was published by his brother Jac. Bemoulli (1690). 

Primitives 

For a given function y = f(x) we want to compute the area between the æ-axis 
and the graph of this function. We fix a point a and denote by z = F(x) the area 
under f(x) between a and x (Fig. 4.1a). The crucial faet is then that 
(4.1) the function f(x) is the derivative of F{ x). 

We then call F(x) a primitive of f(x). 



FIGURE4.1a. Newton’s idea FIG. 4.1b. Leibniz’s idea FIG. 4.1c. Sketch by Newton 1 


Reproduced with permission of Editions Albert Blanchard, Paris. 
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Justification. Newton imagines that the segment BD moves over the area under 
consideration (“And conceive these Areas . . . to be generated by the lines BE and 
BD, as they move along . . Figs. 4.1a, 4.1c);consequently, if x increasesby Ax, 
the area increases by Az = F(x + Ax) — F(x) which, neglecting higher order 
terms of Ax, is f(x)Ax (the dark rectangle of Fig. 4.1a). In the limit Ax — > 0, 
we thus have 

(4.2) dz = f(x) ■ dx and = f(x). 

dx 

Leibniz imagines the area as being a sum (later: “integral”) of small rectan- 
gles (Fig. 4.1b): 

(4.3) z„ = f(x i) Ax i + f(x 2 ) Ax 2 + . . . + f(x n ) Ax n . 

This implies that 

z n — z n - 1 = f(x n ) Ax n , 

and we again get (4.2) when Axi — > 0. Consequently, the derivative is the in- 
verse operation of the integral, much as the difference is the inverse operation to 
addition. 

After long attempts, Leibniz symbolizes the sum in (4.3) (for the limit 
Axt — > 0) by (see Fig. 4.2) 

(4.4) J f(x) dx. 

Nowadays, this area between the bounds a and b is denoted by (Fourier 1822) 



whereas (4.4), the “indefinite integral”, stands for an arbitrary primitive Fix) of 

/(»)• 


Sed exiis quae in 

methodotangentiumexpofui, patet efle d, £xx=xdx; ergo contra f 
»cfxdrt (ut enim poteftates & radices in vulgaribus calculis, fic no- 
bij fummx & differentiæ feu f& d, reciproea: funt.) 

FIGURE4.2. First publication of the integral sign, an old-style “s” (Leibniz 1686) 2 

Primitives are not unique; to each primitive Fix) one can add an arbitrary 
constant C and F(x) + C is again a primitive of the same function. For C = 
— F(a ) we obtain the primitive F(x') — F(a), which vanishes for x = a (as does 
also the area z). Therefore, the area between a and b is 


2 Reproduced with permission of Bibi. Publ. Univ. Genéve. 
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(46) J f{x)dx = F(b)-F(a) 

(see the “Fundamental Theorem of Differential Calculus” in Sect. III.6). 

By reversing differentiation formulas we obtain formulas for primitives. For 
example, the function f(x) = x n+1 has f'(x') = (n + l)x n as derivative. There- 
fore x n+1 /(n + 1) is a primitive of x n . This and other formulas of Sect. II. 1 are 
collected in Table 4.1. 


TAB LE 4.1. A short table of primitives 


J x n dx = 1 + ^ +C (n ^ —1) 

[ — dx = ln x + C 

J x 

J e x dx = e x + C 


j sin x dx = — cos x + C 

J cos x dx = sin x + C 

f — dx = arctan x + C 

[ , dx = arcsinx + C 

J 1 + z 2 

1 VT^x* 


Large tables of primitives can be many hundreds of pages long. We mention 
the tables of Grobner & Hofreiter (1949) and Gradshteyn & Ryzhik (1980). In 
recent years this knowledge has been incorporated into many symbolic computer 
systems. 


Applications 

Area of Parabolas. The area under the nth degree parabola y = x n between a 
and b becomes by (4.6) and Table 4.1 


(4.7) 


r b x n+1 i 6 

Til. 


J ) n + 1 _ ^n+l 
n + 1 


where we have used the notation F(x)\^ = F(b) — F(a). For a = 0 this formula 
is Fermat’s Theorem 1.3.2. 


Area of a Disc. To compute the area of a quarter of a disc we consider the function 
/(:e) = \J 1 — x 2 for 0 < x < 1. A primitive of f(x) is 
X / 1 

(4.8) F(x) = — v f — xP 1 T — arcsinæ. 

This can be checked by differentiating (4.8). Later we shall see how such formulas 
are actually found. Applying (4.6), we thus get 

area of unit disc = 4 f \J\ — x 2 dx = 4(F(1) — F( 0)) = 7r, 


since sin(7r/2) = 1. 
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There is another elegant way of computing the area of a disc. Nothing forces 
us to assume that f(x) dx are slices of small vertical rectangles. Let us cut the disc 
(of radius a) into infinitely thin triangles (Kepler 1615, see as well Leibniz’s idea, 
Fig. 1.4. 11). The area of such a triangle is 



Volume of the Sphere. Consider a sphere of radius a (see Fig. 4.3) and let us cut 
it into thin slices (discs of thickness dx and of radius r = Va 2 — x 2 ). The volume 
of such a slice is dV = r 2 n dx = (a 2 — x 2 )tt dx and for the total volume of the 
sphere we get 



FIGURE4.3. Volume of a sphere 


Work in a Force Field. Suppose that a force f(s) acts in the direction of a straight 
line parameterized by s. The work in moving a body from s to s + As is equal to 
f(s)As (force x length). Therefore, the total work is /'j* f(s) ds. 

Example. The gravitational force of the earth on a mass of lkg is f(s) = 
9.81 • R 2 /s 2 [Al] , if s is the distance to the center. Hence, the energy in moving 
1 kg from the surface to infinity is given by 
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= fasÆd 

JR s 


9.81 R = 62.10® [J]. 


Arc Length. 

The fluxion of the Length is determin’d by putting it equal to the square- 
root of the sum of the squares of the fluxion of the Ahsciss and of the 
Ordinate. (Newton 1736, Fluxions, p. 130) 

We wish to compute the length L of a given curve y(x), a < x < b. If we increase 
x by Ax (see Fig.4.4), the ordinate is increased by Ay = y'{x)Ax (we neglect 
higher order terms). Therefore, the length of a small part of the curve is given by 
As, where 

As 2 = Ax 2 + Ay 2 = (l + y\x) 2 )Ax 2 
(theorem of Pythagoras). For the limit Ax —> 0 we obtain 


(4.9) ds = \/l + y'(x) 2 ■ dx and 



\J\ + y'(x ) 2 dx. 


dy ds /dy 

''dx 
dx 



FIGURE4.4. Arc length of y = x 1 


Example. For the parabola y = x 2 we have y’ = 2x and the length of the arc 
between x = 0 and x = 1 is given by (see (4.27) below) 


/ 1 + Ax 2 + - ln^2a; + \/ 4a; 2 + lj 

= ^ + lln(2 + ^). 


Center of Mass. Consider, for example, two masses mi , m 2 placed at the points 
with abscissas x\, X 2 - The moment applied at the origin is m-iXi + m^x^. The 
center of mass x is the point where both masses, concentrated, would produce the 
same moment, i.e., 


(4.10) 


(mi + m^) ■ x = mixi + m^x^. 
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If the density of a body vanes continuously in such a manner that a slice of thick- 
ness dx has the mass m{x) dx, we have, by analogy with (4.10), 

r b _ r b _ f b xm(x) dx 

(4.11) / m(x)dx-x= / xm(x)dx and x = -2-g . 

Ja Ja f a m(x)dx 

Example. For a triangle formed by the straight line y = cx, 0 < x < a, we have 


(4.12) 


m( x) = cx, 


_ _ cx 2 dx _ a 3 / 3 _ 2 o 
J^cxdx a 2 /2 3 


Remark. For a random variable X with “density function” f(x) (which satisfies 
J_~ f(x) dx = 1), the value x = x f(x) dx is the average of X. 


Integration Techniques 


We shall now explain some general techniques for tinding a primitive. A sys- 
tematic approach for some important classes of functions will be presented in 
Sect. II. 5. 

A first observation is that integration is a linear operation, i.e., 

(4.13) J (ci/i(x) + 02 / 2 ( 2 :)) dx = ci J fi(x) dx + c 2 j f 2 ( x) dx. 

This follows at once from the faet that differentiation is linear (see (1.3)). 

Substitution of a New Variable. Suppose that 


F(z) is a primitive of f(z), 

i.e., F'(z) = f(z), and consider the substitution z = g( x), which transforms the 
variable z into x. It then follows from (1.16) that 


F(g(x)) is a primitive of f(g{x))g'{x). 


Consequently, we have 


(4.14) 


J f(g(x))g'(x)dx = 



because, by (4.6), both terms are equal to F(g{h)) — F(g(a)) . The expression to 
the left is obtained by substituting z = g(x) in f(z) and dz = g'{x)dx. 
Geometric Interpretation. We want to compute 


f 1 ' 5 4x J 

Jo l + X 2 


and use the substitution 


! . Since dz = 2x dx, we obtain from Eq. (4. 14) 
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/•1.5 9 r 2 - 25 o i 2 - 25 

/ 5- -2 xdx = / dz = 2- ln(l + z) 

Jo l + x 2 Jo 1 + z 'lo 

= 2 • ln(l + £ 2 )| o = 2 ln(3.25). 

Fig. 4.5 illustrates the transformation z = x 2 and the functions 4æ/(l + x 2 ) and 
2/(1 + z) . Points x and x + Ax are rnapped to 2 = x 2 and z + Az = x 2 +2xAx + 
Ax 2 . Therefore, the shaded rectangles have, for Ax —> 0, the same areas, and both 
integrals in (4. 14) give the same value. 

Examples. All the art consists in tinding a “good” substitution. This will be 
demonstrated in a series of examples. 

For functions of the form f(ax + b) the substitution z = ax + b is often 
useful. For example, with z = 5x + 2, dz = 5 dx, we have 

(4.15) J e 5 *+ 2 dx = j e z ^- = \e z = \ e 5æ+2 . 

Sometimes the presence of the factor g'(x) for the substitution 2 = g(pc) 
can easily be recognized. For example, in the integral below the factor x suggests 
using 2 = —x 2 , dz = —2x dx and we obtain 

(4.16) J Te** 1 dx = -i Je-dz = -\e‘ = 

From Table 4.1 we obtain the integrals of 1/(1 + x 2 ) or 1/y/l — x 2 . If we 
want to find a primitive for, say, 1/(7 + x 2 ) or 1 /y/l — x 2 we use the substitution 
(E 2 = 7 z 2 or x = y/l z, dx = y/l dz. This yields 

.. f dx f y/l dz 1 1 x 

(417) ] tT^ = J 7(TT^) = Ti “ ct “* = T arctan Tf 
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Quadratic expressions x 2 + 2 bx + c are often simplified by restoring a com- 
plete square as (x + fe) 2 + (c — fe 2 ) followed by the substitution z = x + b. In 
this way the following integral is reduced, by the substitution z = x + 1/2, to the 
integral in (4.17): 

(4.18) 


/ 


dx 

X 2 + X + 1 


/ 


dz 

z 2 + 3/4 


2 2z 2 i 

—= arctan — = = — = arctan 
y/Z \/3 y/3 ' 


2a; + 1 \ 

~7TJ- 


As a last example, we consider the function (x + 2)/ ( x 2 + x + 1). Here we 
write (Euler 1768 § 62) the numerator as x + 2 = (x + 1/2) + 3/2 so that the first 
part x + 1/2 is a scalar multiple of the derivative of the denominator. This part of 
the integral is then computed with the substitution z = x 2 + x + 1. The second 
part is a multiple of (4.18), and we obtain 

(4.19) J 2 q- 1 ~ \ + x + 1) + V3 arctan ^^t ^ . 


Integration by Parts. A second integration technique is based on the differen- 
tiation rule for products (1.4). Integrating the formula (uv)' = u'v + uv' gives 
u(x)v(x) = f ( u'(x)v{x ) + u(x)v'(x)) dx, or equivalently 

(4 20) J u\x)v{x) dx = u(x)v(x) - J u(x)v'( x) dx. 

In this formula, one integral is replaced by another. However, if the factors v! and 
v are properly chosen, the second integral can be easier to evaluate than the first 
one. 

Examples. Let us try to compute f x sin x dx. It would be no use choosing u'(x) = 
x (u(x) = x 2 / 2) and v(x) = sin x because then the second integral would be even 
more difficult to evaluate. Therefore, we choose u'(x) = sina; (u(x) = — cosx) 
and v(x) = x. Equation (4.20) then gives 

(4.21) J x sin x dx = —x cosx + J 1 • cos a; da; = —x cosx + sina; . 

Sometimes it is necessary to repeat the integration by parts. In the following 
example, we first put v{x) = x 2 , u'(x) = e x , and for the second integration by 
parts we put v(x) = x, u'(x) = e x : 

(4.22) J x 2 e x dx = x 2 e x - 2 J x e x dx = e x (x 2 - 2x + 2). 

Functions such as lnx or arctan x have simple derivatives. They will be 
frequently used in the role of v(x): 

J Inxdx = J 1 • lnxdx = xlnx — J — dx = x(lnx — 1), 


(4.23) 
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(4.24) 


/ arctan x dx = x arctan x — / -dx 

' J 1 + x 2 

= æarctanæ — - ln(l + x 2 ). 


Here, the last integral is evaluated with the substitution z = 1 + x 2 , dz = 2x dx. 

Consider next the integral f V 1 + 4x 2 dx, which we encountered in the 
computation of the parabola’s arc length. Integration by parts with u'( x) = 1, 
v(x) = Vi + 4x 2 yields 


Here, the second integral does not look much better than the hrst one. However, 
the numerator can be written as 4x 2 = (1 + 4x 2 ) — 1. The integral can then 
be split into two parts, one of which is — / Vi + 4x 2 dx (the integral we are 
looking for) and can be transferred to the left side; the other resembles the last 
integral of Table 4.1: the derivative of arsinhz is 1/ V 1 + z 2 and we have, with 
the substitution z = 2x (see Exercise 1.4.3), 


= = ^arsinh(2x) = i ln^2a; + \J 4x 2 + . 


This gives, for (4.25), 


/l + 4a: 2 + - ln I2x + V4x 2 + 1 


Recurrence Relations. Suppose we want to compute 

(4.28) /„ = J sin" x dx. 

We put u'(x) = sin x, v(x) = sin" -1 x and apply integration by parts. This yields 

J sin" x dx = — cos x sin" -1 x + (n — 1 ) J cos 2 x sin" -2 x dx. 

We insert cos 2 x = 1 — sin 2 x and the right integral can be split into the two 
integrals /„_ 2 and I„ . Putting I n on the left side, we obtain (1 + n — 1 )I n = 
— cos x sin" -1 x + (n— l)/ n _ 2 , or 

(4.29) /„ = - ^ cos x sin" -1 x + ° ^ ^ I n - 2 - 


This recurrence relation can be used to reduce the computation of I n to that of 
li = f sin x dx = — cos x (if n is odd), or to that of lo = J dx = x (if n is even). 
As a further example, consider the integral 


Jn 


/ 


dx 

C 1 + * 2 )"‘ 


(4.30) 
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In the absence of a better idea, let us apply integration by parts with u'(x) = 1 
and v(x) = 1/(1 + x 2 ) n : 




¥ +n I( 


Using the same trick as in (4.25), we write in the last integral 2x 2 
and obtain 


J„ 


{i + x 2 Y 


+ 2 nJ n — 2nJ n+ i. 


2(l+a; 2 ) — 2 


We are unlucky because the index n, instead of becoming smaller, became larger. 
But this is of no importance: we reverse the formula and get 


(4.31) 


Jn+l 


1 x 2n-l 

2 n (1 + x 2 ) n + 2n 71 


This relation reduces the computation of (4.30) to that of Ji = arctan x. 


Taylor’s Formula with Remainder 

Joh. Bernoulli (1694b, “Effectiones omnium quadraturam . . .”) computed inte- 
grals by repeated integration by parts and obtained “generalissimam” series simi- 
lar to those found later by Taylor. Cauchy (1821) then discovered that this method, 
cleverly modified, leads precisely to Taylor’s series of a function / with the error 
term expressed by an integral. 

The idea is to write (see (4.6)) 

f(x) = f(a)+J 1 • f{t) dt 

and to apply integration by parts with u'(t ) = 1 and v(t) = The crucial faet 
is that we put u(t) = —{x — i) (x is a constant) instead of u{t) = t. We thus get 

f{x) = f(a)-{x-t)f'(t) | o + J (x — t)f"{t) dt 

= f(a) + {x- a)f'(a) + (x - dt. 

In the next step, we put u{t) — (x — t) 2 / 2! and v(t) = f"{t) to obtain 
f{x) = f(a) + (x - a)f'(a) + ^ f"{a) + J ^ ^ dt. 

Continuing this procedure, we arrive at the desired result: 


m = É / (,| <«) + J’ m dt. 


(4.32) 
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Example. For f(x) = e x , f^\x) = e x , and a = 0 Eq. (4.32) becomes 

(4.33) e =i +x + - + ... + - + —— e dt. 

You might now be astonished at seeing the error of the series expressed by an in- 
tegral, after having had all these difficulties in evaluating such integrals. If the in- 
tegral in (4.33) is computed by the above skillful methods, one obtains, of course, 
simply e x — ]T7 =0 x l /i\, which will be of no help at all. The idea is to replace 
the integrand in (4.33) by something simpler. For example, if we suppose that 
0 < x < 1, then 0 < t < 1 too, and the function é lies between the bounds 1 
and 3. It therefore appears convincing (this will later be Theorem III. 5. 14) that the 
corresponding area will also lie between 

r (x-t) k x k + 1 r ( X -t) k o ^_ 3 ^+* 

Jo k\ (k + 1)! Jo k\ (k-+ 1)!' 

This allows the conclusion that, say, for k = 10 the error is smaller than 10 -7 . 


4.1 Let a curve be given in parametric representation x(t'), y(t). Show that its arc 
length for o < t < b is 


fb 

■f.'ø 


(t) 2 + y'(t) 2 dt. 


Compute the arc length of the cycloid (3.18) for 0 < t < 2ir. 

4.2 Compute the integrals 

. f xdx , f dx 


b) 


d )/_ n =/^-, e) [x>e 
J sin ar J 

g) J e ax cos Øx dx, h) J e ax 



6x + 13' 


Hints. For (d) reverse Eq. (4.29), for (e) write x 3 = x ■ x 2 , for (g) and (h) 
do either integration by parts or decompose f e ( a +^) æ dx into its real and 
imaginary parts. 

4.3 Show by repeated integration by parts that for integer values m and n 


(4.34) 


(b - x) m (x - a) n 


dx ■ 


_ (b - ay 


(l-x 


2 ■ 2 ■ 4 ■ 6 ■ ■ ■ 2n 
= 1 •3-5---(2n + l)‘ 


in particular 
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II.5 Functions with Elementary Integral 

The above quantity 

ppads 

qqss — ppaa 

reduces immediately, without any change, to two logarithmical fractions, 
by separating it thus: 

ppads _ \pds _ \pds 
qqss — ppaa qs — pa qs+pa 

(Annex to a letter of Joh. Bemoulli 1699, see Briefwechsel, vol. 1, p. 212) 
Problem 3: If X denotes an arbitrary rational function of x, describe a 
method by which the expression Xdx can be integrated. 

(Euler 1768, Opera Omnia, vol. XI, p. 28) 

In the preceding section, we leamed some techniques of integration. Here, we will 
use these techniques systematically in order to establish the faet that the integrals 
of several classes of functions are elementary. Elementary functions are functions 
composed of polynomials, rational, exponential, logarithmic, trigonometric, and 
inverse trigonometric functions. 


Integration of Rational Functions 


Let R(x) = P(x)/Q(x) be a rational function (P(x) and Q(x) polynomials). We 
shall present a constructive proof of the faet that / R(x) dx is elementary. The 
computation of a primitive will be carried out in three steps: 

- reduction to the case deg P < deg Q (deg P denotes the degree of P(x)); 

- factorization of Q(x) and decomposition of R(x) into partial fractions; and 

- integration of the partial fractions. 


Reduction to the Case deg P < deg Q. A first simplification of the function 
R(x) can be achieved if deg P > deg Q. In this situation, we divide P by Q and 
obtain 


(5.1) 


^1-S(x) + ® 
Q(x) ^ X) + Q(xY 


where S(x) and P(x) are polynomials (quotient and remainder) with degP < 
deg Q. As an example, consider 

P{x) _ 2x 6 - 3æ 5 - 9x 4 + 23x 3 + x 2 - AAx + 39 
Q{x) x 5 + x 4 — 5x 3 — x 2 + 8x — 4 


We first remove the term 2x 6 by subtracting 2 xQ(x) from P(x), then we add 
5 Q(x) to P(x) and arrive at 


(5.3) 


P(x) _ 0 6a; 4 — 20æ 2 + Ax + 19 

Q{x) = X ~ 5 + x 5 + x 4 - 5x 3 - æ 2 + 8x - 4 


The polynomial S(x) is readily integrated so that only the second term in (5.1) 
requires further investigation. 
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Decomposition into Partial Fractions. We assume that a factorization of Q(x) 
into linear terms is known: 

(5.4) Q(x) = (x- ai) mi (a; - a 2 ) m2 ■ . . . ■ (x - a k ) mk = - af)™*. 

Here, a.\ , . . . , a k are the (possibly complex) distinet roots of Q(x) and the rn t are 
their corresponding multiplicities. The following lemma shows how our rational 
funetion can be written as a linear combination of simple fractions, so-called par- 
tial fractions. This idea goes back to the correspondence between Joh. Bernoulli 
and Leibniz (around 1700), and was systematically exploited by Joh. Bernoulli 
(1702), Leibniz (1702), Euler (1768, Caput I, Problema 3), and Hermite (1873). 


(5.1) Lemma. Let Q(x) be given by (5.4) and let P(x) be a polynomial satisfying 
deg P < deg Q. Then there exist constants Cij such that 


x(.X) 

Q(x) “ 4^ ( x — ai)i ' 


Proof. We eliminate one factor of Q{x) after another as follows: we write Q(x) = 
(x—a) m q(x), where a is a root of Q(x) and q(a) f 0. We will show the existence 
of a constant G and of a polynomial p(x) of degree < deg Q — 1 such that 

15 61 -Pfo) = C , p ( x ) 

(x — a) m q(x) (x — a) m (x — a) m ~ 1 q(x) ’ 

or equivalently (multiply by the common denominator), 

(5.7) P(x) = C ■ q(x) + p(x) ■ (x — a). 

By putting x = a, this formula motivates the choice 

(5.8) C = P(a)/q(a). 

The polynomial p(x) is obtained from a division of P(x) — C ■ q(x) by the factor 
(x — a). The same procedure is then recursively applied to the right expression of 
(5.6) and we obtain the desired decomposition (5.5). □ 

Example. The polynomial Q(x) of (5.2) has the factorization 

(5.9) Q(x) = x 5 + x 4 - 5x 3 - x 2 + 8x - 4 = (x - l) 3 (x + 2) 2 . 

Applying (5.7) and (5.8) with a = —2 and m = 2, we obtain, for (5.6), 

6a; 4 — 20x 2 + 4x + 19 —1 | 6x 3 — llx 2 — x + 9 

(X-W&+W = (x + 2)2 + (x-l) 3 (x + 2) ■ 

A second application with a = —2 and m = 1 gives 
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6a; 4 — 20æ 2 + 4x + 19 _ -1 3 3a; 2 — 8a; + 6 

(5 ' 0) (x — l) 3 (a; + 2) 2 = (x + 2) 2 + ^+2 + {x - l) 3 ' 

In the last expression, we replace x = (x — 1) + 1 so that 3a; 2 — 8a; + 6 = 
3(a; — l) 2 — 2(x — 1) + 1, and (5.10) becomes, finally, 

(5.11) 

6a; 4 - 20a; 2 + 4x + 19 1 -2 3 -1 3 

(x — l) 3 (a; + 2) 2 (x - l) 3 (a; — l) 2 x — 1 (x + 2) 2 x + 2 ’ 

Second Possibility. By Lemma 5.1, we know that 


(5.12) 

6a; 4 — 20a; 2 + 4a; + 19 _ A 0 A i A 2 B 0 B i 

(x - l) 3 (a; + 2) 2 ~ {x- l) 3 + (x - l) 2 + x^T + (x + 2) 2 + ^+2 ’ 

The coefficients A, and Bi can be computed as follows: we multiply Eq. (5.12) by 
(x — l) 3 so that 


6æ 4 — 20a; 2 + 4x + 19 
(x + 2) 2 


A 0 + Ai(x - 1) + A 2 (x - l) 2 + (x - 1 ) 3 g(x), 


with some function g(x) well defined in a neighborhood of x = 1. Hence, the A, t 
are the first coefficients of the Taylor series of P(x) /(x + 2) 2 (see Sect. II.2) and 
satisfy 


* i\ dx i V 


d j /6a; 4 — 20a; 2 + 4a; + 19\ 


(x + 2) 2 


i.e., Ao = 1, Ai = —2, A 2 = 3. In a similar way, we get 


1 tf /6a; 4 — 20a: 2 + 4a; + 19' 
i\ dx l V (x — l) 3 


i.e., Bq = — 1, Bi = 3. 


Integration of Partial Fractions. The individual terms in the decomposition (5.5) 
can easily be integrated by using the formulas of Sect. II.4 (see Table 4.1): 


(5.13) 


/ dx 
(x - a)i 


(j - i)(x - ay - 1 > 1 
ln(x — a) if j = 1. 


Combining Eqs. (5.3), (5.9), and (5.1 1), we thus obtain, for our example, 


jwi åx = l2 “ 5;t “2(^3T)r + Å +31n(; '“ 1)+ ^ +31 " <I+2)+c 

If all roots of Q{x) are real (i.e., the q:, of (5.4) are real) then the C,j in 
(5.5) are real and we have expressed the integral as a linear combination of real 
functions. But nothing prevents us from applying the above reduction process also 
in the case where Q(x) has complex roots. 
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Example with Complex Roots. Suppose we want to compute f(l + x 4 ) 1 dx. 
Since the roots of x 4 + 1 = 0 are a\ = (1 + *)/\/2, a 2 = (1 - *)/\/2, a 3 = 
(—1 + i)/V 2, 0:4 = (—1 — i)/\/2, the decomposition of Lemma 5.1 leads to 


1 A ( B 

TT^ ~ X - (1 + i) /y/2 + x - (1 - i)/ y/2 
C D 

+ x + (1 - i)/ s/2 + x + (1 + i)/V 2 


By (5.8), we get 



(oi - a 2 )(a i - o 3 )(oi - o 4 ) iyj 2 • • (\/2 + isj 2) 


= #(-1- 


and similarly B = (-1 + i)\/2/8, C = (1 - i)\/2/8, D = (1 + i)\/2/8. Hence, 

(5 15) / = Aln(X “ (1 + <)/V ^ + Bln(a; “ (1 “ 

+ <71n(x + (1 - i)/v^) + D ln(x + (1 + i)/>/2). 


Using (1.5.1 1) and the relation 

arctanw + arctan(l/u) 


tt /2 if u > O 
— tt /2 ifu<0, 


which follows from (1.4.5) or from (1.4.32), we have 

/ x-lf+tf) = ln(x + + i 


and the right-hand side of expression (5.15) can be written 


(5.16) 

/ Jri = IT ln x2 - j +X ( arCtan(a; ^ +1)+arCtan(a; ^~ 1) ) • 


Avoiding Complex Arithmetic. Whenever complex arithmetic is not desired, we 
can proceed as follows: suppose that the polynomial Q{x) has l distinet complex 
conjugate pairs of roots a 1 ± i (i \ , . . . , o; ± i and k distinet real roots 71,..., 7*. 
Then, we have the real factorization 

l k 

(5.17) Q(x) = JJ((x - o,;) 2 + øf) mi JJ(x - 7 ;) n< - 

where m» and n, denote the multiplicities of the roots. A real version of Lemma 5 . 1 
is then as follows: 
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(5.2) Lemma. Let Q(x) be given by (5.17) and let P(x) be a polynomial with real 
coefficients satisfying deg P < deg Q. Then, there exist real constants Aij , Bjj, 
and Cij such that 


(5.18) 


P(x) — yA Aij + BijX yA Cij 

Q(x) = ^ifri ((* - oaY + (%y li)j 


Proof. The real roots can be treated as in the proof of Lemma 5.1. For the treatment 
of the complex roots we write Q(x) = ((a; — a) 2 + ff 2 ) q( x), where a + i/3 is 
a root of Q(x) and q(a ± i/3) f 0. Then, there exist real constants A, B and a 
polynomial p(x) of degree < deg Q — 2 such that 

P(x) _ A + Bx p(x) 

((x ~ a) 2 + (3 2 ) m q(x) ((x - a ) 2 + (3 2 ) m + (( x - a ) 2 + /3 2 ) m_1 g(x) ’ 

To see this, we consider the equivalent equation 

P(x) ss. (Å + Bx) ■ q(x) + p(x) ■ ((æ - a) 2 + Ø 2 ) . 

By putting x = a±i/3, this formula yields A and B, and the polynomial p(x) is 
obtained from a division of P(x) — (A + Bx) ■ q(x) by the factor ( (x — a) 2 + /3 2 ) . 
As in the proof of Lemma 5 . 1 , the formula (5 . 1 8) is then obtained by induction on 
the degree of Q(x) . □ 


For the integration of the general term of (5.18) we write it as 
A + Bx _ B(x — a) A + Ba 

((* - «) 2 + (3 2 ) j ~ ((x - a) 2 + (3 2 ) j + ((* - a) 2 + / 3 2 ) j 


The first term of this sum can immediately be integrated with the help of the 
substitution z = (x — a) 2 + ff 2 , dz = 2(x — ce)dx. For the second term we use 
the substitution z = (x — a)//3 and obtain the integral (4.30) of Sect. II.4. Hence, 
for j = 1 we have 


(x — a) 2 + /3 2 dX = 2 ln ( (æ - 0)2 + ^ + arctan (-0- 


and for j > 1 

f A + Bx ^ -B + A ±Bo_j ( x-a \ 

J (( x-a) 2 + f3 2 Y 2(j-l)((x-a) 2 + p 2 Y 1 P 2 ^ 1 J V P > 

where J\ (z) = arctan 2 and 

< 5 - 19 > = + 2J W Ljåz) - 

Example. For the function of Eq. (5.14), Lemma 5.2 gives the decomposition 

1 1 A + Bx C + Dx 

1 + aA (x 2 + 02x + l)(x 2 — y/2x + 1) x 2 + \/2x + l x 2 — \/2x + 1 
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Multiplication of this relation by ( x 2 + y/2x+l) and insertion ofa; = (— 1 ±i)/ \/2 
yields 


2 =F 2i 


(-l±f) 

V2 


and A = 1/2, B = \/2/4 is obtained by comparing real and imaginary parts of 
this relation. The constants C = 1/2 and D = —y%/4 are obtained analogously. 
Using the above formulas we get (5.16) again. 

Remark. Decomposition into partial fractions renewed the interest of the mathe- 
maticians of the 18th century for the roots of polynomials and for algebra. 


Useful Substitutions 

We now exploit the above result and present several substitutions that lead to fur- 
ther classes of functions whose indefinite integrals are elementary functions. In 
the rest of this section, R denotes a rational function with one, two, or three argu- 
ments. 

Integrals of the Form f R( \/ax + b, x)dx. An obvious substitution is 

(5.20) y/ax + b = u, x = — -. dx = — ■ u" -1 • du, 

with which we get 

J R(V^+b,x)dx=- J r(u, U ^ ^ u n ~ x du = J R(u)du, 

where R(u) is a rational function. This last integral can be computed with the 
techniques explained above. 

Integrals of the Form f R(e Xx )dx. The obvious substitution u = e Xx gives 
du = \e Xx dx and dx = du/ (Au), and the resulting integral is that of a rational 
function. 

Example. 

/ dx f dx 2 f 

2 + sinh x J 2 + (e x -e~ x )/2 J u 2 + Au - 1 
_ 0 f du _l u + 2— \/5_l e æ + 2- \/5 

J (u + 2) 2 -5~ V5 n u + 2 + V5~ V5 n e x +2 + V5' 

Here we have used the formula of Exercise 5.1 below. 

Integrals of the Form /R(sin x, cos x, tan x)dx. We know from antiquity 
(Pythagoras 570-501 B.C., see also R.C. Buck 1980, Sherlock Holmes in Babylon, 
Am. Math. Monthly vol. 87, Nr. 5, p. 335-345) that the triples (3, 4, 5), (5, 12, 13), 
(7,24,25), . . ., satisfy a 2 +b 2 = c 2 and are of the form (u, (u 2 -l)/2, (u 2 + l)/2). 
This suggests the substitution (Euler 1768, Caput V, §261) 
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2u 1 — u 2 2 u 

(5.21) sini = — - — cosa;= — - — tana; = j. 

One verifies that sin x = u(l + cos x), so that 
the point (cosæ, sin x) lies at the intersection 
of the line ty = u(l + Q with the unit circle 
(see the figure). Consequently, we have u = 
tan(a;/2), x = 2 arctan w, and 

2 

dx = — - — ^ du. 

All this inserted into f R( sin x, cos x. tan x)dx 
provides an integral of a rational function. 

Example. 

dx f 2 du f du 

2 + sin x = ] (l + «2)(2 + _2^) = J u 2 + u + 1' 

The last integral is known from Eq. (4.18), thus, 

f dx 2 ( 2 / 1\\ 2 / 2 / x \\\ 

/ = — = arctan — = u+ - = — — arctan — = tan — b - ■ 

J 2 + sin x ^3 VV3V 2)) ^3 VV3V 2 2)) 


Integrals of the Form f R(Vax 2 +2 bx + c, x)dx. The idea (Euler 1768, § 88) 
is to define a new variable z by the relation ax 2 + 2 bx + c = a(x — z)' 2 . This 
yields the substitution 



= 2(6 4 


dx = 


a(az 2 + 2bz + c) ^ 
2(6 + az) 2 ° 


(5-22) ^ / ax 2 + 2bx + c= ±\/a (z — x) = ±^/a ■ 

z = x =fc ’s/ax 2 + 2bx + c j \fa , 


2(6 4 


and we again get an integral of a rational function. For a < 0 this leads to complex 
arithmetic, which can be avoided by the transformation of Exercise 5.3. 

Sometimes it is more convenient to transform the expression V 'ax 2 + 2bx + c 
by a suitable linear substitution z = ax + /3 into one of the forms 


2 + l, 

Then, the substitutions 
(5.23) z = sinhu, 


can be applied to eliminate the square root in the integral. 
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Example. Consider again the integral (4.27). Putting x = sinli u, we get 



For the inverse function of x = sinh u see Exercise 1.4.3. 


5.1 (Joh. Bernoulli, see quotation at the beginning of this section). I 


5.2 Show that J , xj dx is an elementary function. 

5.3 (Euler 1768, Caput II, §88). Suppose that ax 2 + 2 bx + c has distinet real 
roots a, /?. Show that the substitution z 2 = a(x — (3)/(x — a) transforms the 
integral 

J R^\/ ax 2 + 2 bx + c, x^ dx 

(R is a rational function of two arguments) into f R(z) dz, where R is ratio- 


5.4 Mr. C.L. Ever simplifies Eq. (5.16) with the help of (1.4.32) to 

f dx \/2 , x 2 + \/2x + 1 \/2 xV2 

/ — : = In ys. 1 aretan é 

i I 4 + 1 8 x 2 -y/2x+l 4 1 — X 2 

and obtains, e.g., 

f = p l n 5 + P arctan(— 2) = -0.1069250677, 

Jo x +1 ° 4 

a negative value for the integral of a positive function. Where did he make a 
mistake and what is the correct value? 

5.5 Compute 



twice; once with the substitution (5.22) and once with the substitution (5.23). 
This leads to the formula arsinh x = ln(a; + \Jx 2 + 1) (see Exercise 1.4.3). 

5.6 Provethat 

/ 7?(sin 2 x, cos 2 x. tan x) dx 

can be integrated with the substitution 
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II.6 Approximate Computation of Integrals 

. . . because after all these attempts, analysts have finally concluded that one 
must abandon all hope of expressing elliptical ares with the use of algebraic 
formulas, logarithms and circular ares. 

(Lambert 1772, Rectification elliptischer Bogen . . Opera vol. I, p. 312) 
Although the problem of numerical quadrature is about two hundred years 
old and has been considered by many geometers: Newton, Cotes, Gauss, 
Jacobi, Hermite, Tchébychef, Christoffel, Heine, Radeau [sic], A. Markov, 
T. Stitjes [sic], C. Possé, C. Andréev, N. Sonin and others, it can neverthe- 
less not be considered sufficiently exhausted. (Steklov 1918) 

One easily convinces oneself by our method that the integral f ' h: , 
which has greatly occupied geometers, is impossible in finite form . . . 

(Liouville 1835, p. 113) 


In spite of the extraordinary results of the previous sections, many integrals re- 
sisted the ingenuity of the Bernoullis, of Euler, of Lagrange, and of many others. 
Amongst these integrals, we note 



The last three are so-called “elliptic integrals”. Legendre, Abel, Jacobi, and Weier- 
strass devote a great deal of their work to the study of these integrals. The above 
integrals cannot be expressed in finite terms of elementary funetions (Liouville 
1835, see quotation), and we are confronted with new funetions that have to be 
computed with new methods. 

We consider three approaches: (1) series expansions; (2) approximation by 
polynomials (numerical integration); and (3) asymptotic expansions. 

Series Expansions 

The idea is to develop the funetion into a series (either in terms of powers of x, or 
in terms of other expressions) and to integrate term by term. A justification of this 
procedure will be given in Sect. HL 5 below. 

Historical Examples. The computations of Mercator (see Eq. (1.3.13)) 


ln(l + x) = J — dx = J (l — x + x 2 — . . .'jdx = x — "Tf + ij — ••• 

are the oldest example. The computation of the length of an are of the circle y = 
Vi - x 2 (see Eq. (4.9) and Theorem 1.2.2) 



is precisely Newton’s approach to Eq. (1.4.25). 
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Perimeter of the Ellipse. We wish to compute the perimeter of the ellipse with 
semiaxes 1 and 6 : 

9 y 2 

x + p- = 1 or x = cost, y = 6sint. 

Since dx = — sin t dt and dy = b cos t dt, the perimeter is 


/■2 tt rn/2 

P = / yj dx 2 + dy 2 = 4 / v sin 2 t + b 2 cos 2 t dt 

(ril) Jo J \ /2 

= 4 J \/l — (1 — 6 2 ) cos 2 1 dt. 
a 

This is an “elliptic integral” (whence the name), which is not elementary. We 
compute it as follows: suppose that 1 > b > 0, thus 0 < a < 1. The idea is to use 
Newton’ s series for y/l - x (Theorem 1.2.2), 


(6.2) 

which gives 
(6.3) 


= 1 - - —x 
2 2-4 


11-3 
’ 2-4-6 æ 


r 71 ’/ 2 a 1-1 

P = 4 (1 — — cos 2 1 - - — - a 2 cos 4 1 — . 

Jo 2 2-4 


With the techniques of Sect. II.4 (see Eq. (4.28)), we find that 
1-3-5 • . . . • (2n - 1) 


s: 


cos 2 " tdt = — ■ 


2 • 4 • 6 • . . . • (2 n] 
and (6.3) becomes (cf. Euler 1750, Opera, vol. XX, p. 49) 


2-4-6 2-4-6 

The convergence of this formula is illustrated in Fig. 6.1. For a = 0 (i.e., 6 = 1) 
we have a circle, and P = 2tt. For a = 1 (i.e., 6 = 0) the series converges very 
slowly to the correct value, 4. 


Fresnel’s Integrals. The Fresnel Integrals (Fresnel 1818), 


(6.5) x(t) = J cos (u 2 ) du, y(t) = J sin(u 2 ) du, 

have interesting properties (Exercise 6.4) and produce, in the ( x , y) plane, a beau- 
tiful spiral (Fig. 6.2). They are not elementary. However, the functions sin(u 2 ) and 
cos('u 2 ) have a simple infinite series (the series of sin 2 and cos z where z = u 2 \ 
see (1.4.16) and (1.4.17)), of which we evaluate the integral term by term, as fol- 
lows: 
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f* ( 2 « 6 w 10 \ , t 3 t 7 t 11 

J 0 \ ~ 3T + W “ ' ‘ 7 du ~ J “ 7^3! + 11-5! _ 

u 4 \ , t 5 t 9 t 13 

i i. 1 - 2! + - • ) ^ “ 5T2T + 9T4T “ T3T6T + • • • ' 


The convergence of these series is illustrated in Fig. 6.3. The results are excellent 
for small values of t. For increasing values of |£|, more and more terms need to be 
taken into account. 



FIGURE6.3. Fresnel’s Integrals by power series; thenumbers 5, 9, 13and7, 11, 15indicate 
the last power of t taken into account 


Numerical Methods 

Suppose we want to compute the integral f b f(x)dx, where the integration inter- 
val is given. The idea is the following: we fix N, subdivide the interval [a, b] into 
N subintervals of length h = (b — a) /N, 

xq = a, x\=a + h, ... Xi=a + ih, ... xn = b, 

and replace the function f(x) locally by polynomials that can easily be integrated. 




II.6 Approximate Computation of Integrals 129 


Trapezoidal Rule. On the interval 1], the function f(x) is replaced by 

a straight line passing through and (xj+i, The integral be- 

tween Xi and x l+ \ is then approximated by the trapezoidal area h ■ ( f(xi ) + 
f(xi+i)) /2 and we obtain 

[ f{x) dx^J2^ ( f( X i ) + f( X i+ 1)) 

Ja i = 0 Z 

(6.6) = + f(x i) + f(x 2 ) + • ■ • + f(x N - 1 ) + . 

Example. The upper pictures of Fig. 6.4 show the functions cos x 1 and sin x 2 to- 
gether with the trapezoidal approximations (step size h = 0.5, N = 10). The 
points of the lower pictures represent approximations to Fresnel’s Integrals ob- 
tained with h = 1/2 and h = 1/8; the corresponding values are connected by 
straight lines. 



FIGURE 6.4. Fresnel’s Integrals by the Trapezoidal Rule 


Simpson’s Method (named after Simpson 1743). The idea is to choose three suc- 
cessive values of f(xi) (t/j = f(xi)) and to compute the parabola of interpolation 
through these points (see Theorem 1.1.2 and Eq. (2.6)): 

, x , , ^Ay 0 . (x-x 0 )(x-x 1 ) A 2 y 0 
p(x) =yo + (x-x 0 )— + - ^ p— 

With the substitution x = xo + th, the area between the x-axis and this parabola 
becomes 


J p(x) dx = 2h - yo + hj tdt- Ay 0 + hj ^ ^ ^ ^ ^ 2 l 

= | (yo + 4 t/i + yz) • 


(6.7) 
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We find Simpson’s Rule (N even) 

( 6 . 8 ) 

J f(x)dx » ^[f(x 0 ) + 4:f(x 1 ) + 2f(x2)+4f(x3) + 2 f(x 4 ,) + ... + f(x N )y 


Newton-Cotes Methods. Taking higher degree interpolation polynomials, we 
find, in the same way, 

J f(x) dx « ^ (/(x o) + 3/(xi) + 3 f(x 2 ) + /(ats)) 

j f(x) dx « ^ (jf(xo) + 32/(xi) + 12/(x 2 ) + 32 /(x 3 ) + 7/(x 4 )) , 

and so on. The first one, due to Newton (1671), is called the 3/8-rule. In 1711, 
Cotes computed these formulas for all degrees up to 10 (see Goldstine 1977, 
P- 77). 

Numerical Examples. We compute approximations of ^ = ln(10) with the 

above methods for N = 12, 24, 48, The results are presented in Table 6. 1 . We 

observe a genuine improvement only in every second column (for an explanation, 
see Exercise 6.5). 


TAB LE 6.1. Computation of J [ 1 ° with different quadrature formulas 


N 

Trapezoid 

Simpson 

Newton 

Cotes 

12 

2.34 

2.307 

2.31 

2.305 

24 

2.31 

2.303 

2.303 

2.3027 

48 

2.305 

2.3026 

2.3026 

2.30259 

96 

2.303 

2.302587 

2.30259 

2.3025852 

192 

2.3027 

2.3025852 

2.3025854 

2.302585095 

384 

2.3026 

2.3025851 

2.3025851 

2.3025850930 

768 

2.3025 

2.302585093 

2.302585094 

2.3025850929947 

1536 

2.302587 

2.3025850930 

2.3025850930 

2.30258509299405 

3072 

2.3025858 

2.302585092996 

2.302585092999 

2.3025850929940458 

6144 

2.3025852 

2.3025850929941 

2.3025850929943 

2.302585092994045686 


An interesting phenomenon can be observed when applying the trapezoidal 
rule to the elliptic integral P = / Q 27r Vi — a cos 2 1 dt (here with b = 0.2, a = 
0.96, see Table 6.2). It converges much better than expected. The reason is that the 
function f[t) is periodic and the “superconvergence” is explained by the Euler- 
Maclaurin formula of Sect. II. 10. 
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TAB LE 6.2. Computation of an elliptic integral with the trapezoidal rule 

N Trapezoid 

12 4.1 
24 4.201 
48 4.2020080 
96 4.20200890792 
192 4.20200890793780018891 
384 4.2020089079378001889398329176947477824 


Asymptotic Expansions 

This method was used by Laplace (1812) for f 0 e -t dt (see Oeuvres, tome VII, 
p. 104 and Exercise 6.7) and by Cauchy in 1842 for Fresnel’s integrals (see Kline 
1972, p. 1100). Whereas series expansions and numerical methods are useful for 
small and moderate values of x, the method of asymptotic expansions is especially 
adapted for large x. 

We illustrate this technique on the example of Fresnel’s integrals. For the 
limiting case x — > oo the exact value of the integral is known to be (Exercise 
IV.5.14) 

(6.9) J cos t 2 dt = J smt 2 dt=^^J^. 

The idea is now to split the integral according to jg s= / 0 °° — / f °° , i.e., 


(6.10) J cost 2 dt = ~ j cos t 2 dt. 

To the integral on the right, we artificially add the factors 2 1 and 1 / (2 1) and apply 
integration by parts with u(t) = 1 /t, v(t') = sin t 2 . This yields 

f°° 2 , 1 Z" 00 1 „ o , 1 1 . o 1 f°° 1 • 2 , 

— / cos t dt = —— I - • 2 1 cos t dt = — — sm x — — / -=■ sin t dt. 

J, 2 J x t 2 x 2 J x t 2 

We find an integral that appears by no means easier than the first one. However, 
for x large, the integral on the right, which contains the additional factor l/t 2 , 
is much smaller than the original one. Therefore, (2a;) -1 sina; 2 will be a good 
approximation for — f°° cos t 2 dt. If the precision is not yet good enough, we 
repeat the same procedure (here with u(t) = l/t 3 and v(t) = — cost 2 ), 
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Continuing like this, we find from (6.10) that 


f x , , Hf |i 9 11 , 1-31 

/ cos t at = - * — + - - sin x - — — cos x — n n n s 

Jo 2 V 2 2x 2-2 x 3 2-2-2 x 5 


y ; 2-2-2-2x 7 2 • 2 • 2 • 2 • 2 a; 

An analogous formula is valid for 

f x , l ø lå 11 9 1-31 

/ sin t dt = - \ — — - — cos x — — — - - 5 - sin x + n n n —r c 

J 0 2V2 2x 2-2 x 3 2-2-2 x 5 


The extraordinary precision of these approximations for large x is illustrated in 
Fig. 6.5. The numbers 1, 3, 5 indicate the last power of 1/x taken into account. 



( 6 . 1 ) Remark. The error of the truncated series (6.12) can easily be estimated. For 
example, if we truncate after the term ( 2 x ) _1 sin x 2 , the above derivation shows 
that the error is given by the value of the integral in ( 6 . 11 ) (taken over x < t < 00 ). 
Using |cost 2 | < 1 this yields the estimate (2æ 3 ) -1 , which, foræ > 2,islessthan 
0.0625. 

( 6 . 2 ) Remark. The infinite series (6.12) and (6.13) do not converge for a fixed x. 
The reason is that the general term contains the factor 1 • 3 • 5 • 7 • 9 • . . . in the 
numerator, which dominates all other factors. Such series were called asymptotic 
expansions by Poincaré. 


Exercises 

6.1 (Joh. Bernoulli 1697). Derive the “series mirabili” 



II.6 Approximate Computation of Integrals 133 


Hint. Use the series for the exponential function in x x = e x 1,1 x and compute 
/ x n (ln x) n dx by integration by parts. 

6.2 The integral f x 2 dx/\/l — x 4 was encountered by Jac. Bernoulli in his com- 
putation of the elastic line and by Leibniz in his study of the Isochrona Para- 
centrica. Verify the formula (Leibniz 1694b) 

f x 2 dx 1 , 1 7 1-3 n 1-3-5 15e 

J Vl--r 4 3 7-2-1 11-4-1-2 15 • 8 • 1 • 2 • 3 


6.3 As in (6.7), derive the formulas of Newton and Cotes by integrating the inter- 
polation polynomials of degree 3 and 4 on the intervals [xo, £ 3 ] and [xo, X 4 ], 
respectively. 

6.4 For the curve defined by (6.5) (see Fig. 6.2) prove that 

a) the length of the arc between the origin and (x(t). y(t)) is equal to i; and 

b) the radius of curvature at the point (x(t), y(t)) is equal to 1/(2 1). 

6.5 Prove that Simpson’s method is exact for all polynomials of degree 3. 

6.6 Compute 


with the help of Simpson’s method. Study the decrease of the error with 
increasing N. 

Result. The correct value is (7r/8) I 11 2 = 0.2721982613 . 


6.7 Using J 0 °° e t dt = y/n/2 (see (IV.5.41) below), derive an asymptotic ex- 
pansion for the error function <P(x) = J Q X e _t dt that is valid for large 
values of x (Laplace 1812, Livre premier, No. 44). 


Result. <P(x) = 1 - 


6.8 Compute numerically the integral 


2 2 • 


2 3 • 


1: 


— = cos x 2 dx 
\/x 


ns/2 sj2 + y/2 
4--T(3/4) 


1.674813394. 


Choose two numbers Ak 1/10 and B k 10 and compute the integral 

a) on the interval (0, A] by a series; 

b) on the interval [A, B ] by Simpson’s method; and 

c) on the interval [B, 00 ) by an asymptotic expansion. 
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II.7 Ordinary Differential Equations 

Ergo & horum integralia aequantur. (Jac. Bernoulli 1690) 
In Sects. II.4 and II. 5, we treated the problem of finding a primitive of a given 
function f(x), i.e., we were looking for a function y(x) satisfying y'(x) = f(x). 
Here, we consider the more difficult problem where the function / may also de- 
pend on the unknown function y(x). An ordinary differential equation is a relation 
of the form 

(7.1) y' = f(x,y). 

We are searching for a function y(x) such that y'{x) = f(x, y(x)) for all x in a 
certain interval. Let us begin with some historical examples (for more details, see 
Wanner 1988). 

The Isochrone of Leibniz. Galilei discovered that a body, falling from the origin 
along the y-axis, increases its velocity according to v = ff—‘2gy, where g is the 
acceleration due to gravity. During his dispute with the Cartesians about mechan- 
ics, Leibniz (in the Sept. 1687 issue of the journal Nouvelles de la République des 
lettres ) poses the following problem: find a curve y(x) (see Fig. 7.1) such that, 
when the body is sliding along this curve, its vertical velocity dy/dt is everywhere 
equal to a given constant —b. 



FIGURE7.1. Leibniz’s isochrone 


One month later, “Vir Celeberrimus Christianus Hugenius” (Huygens) gives 
the solution, “sed suppressa demonstratione & explicatione”. The “demonstratio”, 
then published in Leibniz (1689), is unsatisfactory, since the solution is guessed 
and then shown to possess the desired property. A general method for finding the 
solution with the help of the “modern” differential calculus was then published 
by Jac. Bernoulli (1690). This started the era of spectacular discoveries made by 
Jac. and Joh. Bernoulli, later by Euler and Daniel Bernoulli, and made Basel for 
several decades the worid center of mathematical research. 

Let us write Galilei’s formula as 
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/ ds\ 2 _ dx 2 + dy 2 


—2 gy (s = arclength), 


V dt J dt 2 

divide by ( dy/dt ) 2 = +b 2 (which is the required condition), and obtain 

(7.3) Æ) 2 + 1 = Z?™ or %-= - 1 . 

\dyJ b 2 dx y/-l- 2gy/b 2 

a differential equation as in (7.1). In order to understand Bernoulli’s idea, we write 
(7.3) as 




which expresses the faet (see Fig. 7.1) that the two striped rectangles always have 
the same area. So Jacob writes “Ergo & horum Integralia aequantur” (this is the 
first appearence in mathematics of the word “integral”), meaning that the areas S\ 
and S 2 also have to be equal. After integrating, we find the solution 


x 


2 m 3 ' 2 

3g V b 2 ) ’ 


and the “Solutio sit linea paraboloeides quadrato cubica . . .” (Leibniz). 


The Tractrix. 

The distinguished Parisian physician Claude Perrault, equally famous for 
his work in mechanics and in architecture, well known for his edition of 
Vitruvius, and in his lifetime an important member of the Royal French 
Academy of Science, proposed this problem to me and to many others be- 
fore me, readily admitting that he had not been able to solve it . . . 

(Leibniz 1693) 

While Leibniz was in Paris (1672-1676) taking mathe- 
matical lessons from Huygens, the famous anatomist and 
architect Claude Perrault formulated the following prob- 
lem: for which curve is the tangent at each point P of 
constant length a between P and the x-axis (Fig. 7.2)? To 
illustrate this question, he took out of his fob a “horolo- 
gio portabili suae thecae argenteae” and pulis it across 
the table. He mentioned that no mathematician from 
Paris or Toulouse (Fermat) was able to find the formula. 

Leibniz published his solution in 1693 (see Leibniz 1693), asserting that he 
had known it for quite some time, as 



dy 


dx 


y 




dy = dx, 


one finds (“ergo & horum . . .”) the solution by quadrature (Figs. 7.2 or 7.3). Leib- 
niz asserts that it was “a well-known faet” that this area is expressible with the log- 
arithm, which, using the substitution \J o 2 — y' 2 = v,a 2 —y 2 = v 2 , —ydy = v dv, 
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tums out to be true (see also Exercise 7.1). We mention that Leibniz’s interest in 
this theory also went the other way around: use Perrault’s watch as a mechani- 
cal integration machine for the computation of integral (7.5) (and hence of loga- 
rithms) and design other mechanical devices for similar integrals. 

The Catenary. 

But to better judge the quality of your algorithm I wait impatiently to see 
the results you have obtained conceming the shape of the hanging rope or 
chain, which Mr. Bemouilly proposed that you investigate, for which I am 
very grateful to him, because this curve possesses remarkable properties. I 
considered it long ago in my youth, when I was only 15 years old, and I 
proved to Father Mersenne that it was not a parabola . . . 

(Letter of Huygens to Leibniz, Oct. 9, 1690) 
The efforts of my brother were without success, I myself was more fortu- 
nate, since I found the way . . . It is true that this required meditation which 
robbed me of sleep for an entire night . . . 

(Joh. Bemoulli, see Briefwechsel, vol. 1, p. 98) 

Galilei (1638) asserted that a chain hanging from two nails forms “ad unguem” 
a parabola. Some 20 years later, a 16 year old Dutch boy (Christiaan Huygens) 
discovered that this result must be wrong. Finally, the solution of the problem of 
the shape of a hanging flexible line (“Linea Catenaria vel Funicularis”) by Leib- 
niz (1691b) and Joh. Bernoulli (1691) was an enormous success for the “new” 
calculus. Here are Johann’s ideas ( Opera vol. III, p. 491-493). 

We let B be the lowest point and A an arbitrary point on the curve (Fig. 7.4). 
We then draw the tangents AE and BE and imagine the mass of the chain of length 
s between A and B concentrated in the point E hanging on two threads without 
mass (“duorum filiorum nullius gravitatis”). Since the mass in E is proportional to 
s, the parallelogram of forces in E shows that the slope in A is proportional to the 
arc length, i.e., 

1 Reproduced with permission of Bibi. Publ. Univ. Genéve. 
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FIGURE 7.4. The catenary FIGURE 7.5. Catenary (Leibniz 1691) 2 


(7.6) c-y' = s. 

From here, Johann’s computations are very complicated, using second differen- 
tials (see Opera vol. III, p. 426). They become easy, however, if we replace, in the 
spirit of Riccati (see (7.21) below), the derivative y' by a new variable p and have 
after differentiation 

(7.7) c - dp= ds = \J\ +p 2 dx, 

a differential equation between the variables p and x. Integration gives 



(7.8) p = sinh^- — — ^ and y = K + c- cosh^- — — 


The Brachistochrone. 

Given two points A and B in a vertical plane, determine the path AMB 
along which a moving particle M, starting at A and descending solely un- 
der the influence of its weight, reaches B in the shortest time. 

(Joh. Bemoulli 1696) 

This problem seems to be one of most curious and beautiful that has ever 
been proposed, and I would very much like to apply my efforts to it, but 
for this it would be necessary that you reduce it to pure mathematics, since 
physics bothers me . . . 

(de L’Hospital, letter to Joh. Bemoulli, June 15, 1696) 

2 Reproduced with permission of Bibi. Publ. Univ. Genéve. 
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Galilei proves in 1638 that a body sliding from A to C (Fig. 7.7) takes less time on 
the detour ADC than on the shortest path (due to its larger initial velocity). He con- 
tinues and proves that ADEC, ADEFC, ADEFGC are always quicker and finally 
concludes that the circle is the quickest of all paths. Hearing that his brother Jacob 
makes the same mistake, Johann (1696) seizes this as the occasion for organizing 
a public contest to find the brachistochrone line ( /3gaxvt ; = short, xqovoc, = 
time). The solutions handed in on time, including Jacob’s, were unfortunately all 
correct; nevertheless, Johann’s is the most elegant one: he makes an analogy to 
“Fermat’s Priciple” (see Eq. (2.5)): 



He thinks of many layers where the “speed of fight” is given by v = y/2gy 
(see (7.2) and Fig. 7.6). The quickest path is the one satisfying everywhere the law 
of refraction (Fermat’s principle), 

sina 

Hence, we have, because of sina = dx/ds, 

(7 - 9) + % ' ^y =K or dx = \j • dy - 

Still in accordance with “ergo & horum integralia æquantur”, the substitution 

(7.10) y = c - sin 2 u = ^ ^ cos 2 u 

leads to the formula 

(7.11) x — xo = cu — ^ sin2w 

“ex qua concludo Curvam Brachystochronam esse Cycloidem vulgarem”. 
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Some Types of Integrable Equations 

We now discuss some of the simplest types of differential equations, which can be 
solved by the computation of integrals. 

Equation with Separable Variables. 

(7.12) y' = f(x)g(y). 

All of the preceding examples, namely, (7.3), (7.5), (7.7), and (7.9), are of this 
type. They are solved by writing y' = dy/dx, by “separation of variables” and 
integration (“ergo & . . .”), i.e., 

(7 ' 13) g(yj =f ^ dx and J ~g(yj = Jf( x ) dx + c - 

If G{y) and F(x) are primitives of 1 / g{y) and f(x), respectively, the solution is 
expressed by G(y) = F(x) + C. 

Linear Homogeneous Equation. 

(7.14) y' = f(x)y. 

This is a special case of (7.12). Its solution is given by 

(7.15) ln y = J f(x)dx + C, or y = C ■ exp(^J f(x)dx 


Linear Inhomogeneous Equation. 

(7.16) y' = f(x)y + g(x). 


Joh. Bernoulli proposes to write the solution as a product of two functions y(x) = 
u(x) ■ v(x) (like Tartaglia’s idea, Eq. (1.1.5)). We then obtain 


du 

dx 


v + g(x). 


We can now equalize the two terms separately and find 


(7.17a) — = f(x ) • u to obtain u, 

dx 

(7.17b) ? = ^ tootain,, 

dx u(x) 


Equation (7.17a) is a homogeneous linear equation for u and its solution is given 
by (7.15). The function v(x) is then obtained by integration of (7.17b). Conse- 
quently, the solution of (7.16) is 

(7.18) y(x) = C ■ u{x) + u(x) J dt, 


i(x) = exp ^ J f(t)dt). 
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This relation expresses the faet that the solution of (7.16) is a sum ofthe general 
solution ofthe homogeneous equation with a particular solution ofthe inhomoge- 
neous equation. 

Bernoulli’s Differential Equation. 

In truth, there is nothing more ingenious than the solution that you give for 
your brother’s equation; and this solution is so simple that one is surprised 
at how difficult the problem appeared to be: this is indeed what one calls an 
elegant solution. (P. Varignon, letter to Joh. Bemoulli “6 Aoust 1697”) 

In 1695, Jac. Bernoulli struggles for months on the solution of 
(7.19) y' = f(x) ■ y + g(x) ■ y n . 

This is a good occasion for Jacob to organize an official contest. Unfortunately, 
Johann has straightaway two elegant ideas (see Joh. Bernoulli 1697b). The first 
idea is treated in Exercise 7.2. The second one is the same as explained above, 
namely to write the solution as y{x) = u(x) ■ v(x). For the differential equation 

(7.19) this again yields (7.17a) for u and 

(7.20) ^ = g{x)u n ~ 1 {x)v n , 

a differential equation that can be solved by separation of variables. This leads to 
the solution 


y{x) = u(x)(c+ (1 - 


r x , \ 1/(1 -n) 

n) J o g(t)u n 1 (t) dtj 


where u(x) is as in (7.18). 


Second-Order Differential Equations 

To free the above formula from the second differences, . . . , we denote the 
subnormal BF by p. (Riccati 1712) 

A second-order differential equation is of the form 

y" = f(x,y,y')- 

The analytic solution of such an equation is very seldom possible. There are a few 
exceptions. 

Equations Independent of y. It is natural to put p = y', so that the differential 
equation y" = f(x, y r ) becomes the first-order equation p' = f(x,p). We remark 
that the differential equation (7.7) of the catenary is actually of this type. 

Equations Independent of x. 

(7.21) y" = f(y,y')- 

The idea (Riccati 1712) is to consider y as an independent variable and to search 
for a funetion p(y) such that y' = p(y). The chain rule gives 
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„ dp dp dy , 

V ~d^^d^'d^~ P ' P ' 

and Eq. (7.21) becomes the first-order equation 

(7.22) p'-p = f{y,p)- 

When the function p{y) has been found from (7.22), it remains to integrate y' = 
p(y), which is an equation of type (7.12). 

Example. The movement of a pendulum 
(see the sketch by Leonardo da Vinci) is 
described by the equation 


ÆajjÉte? ** 


(7.23) 


y" + sin y = 0 


(y denotes the deviation from equilibrium). 
Since Eq. (7.23) does not depend on t (we 
write t instead of x, because this variable 
denotes the time in this example), we can 
use the above transformation to obtain 


m M i 

mXv. . «*»|» i », 

:*4 

tu „ . v (H.lv ■ H-WH . Ijtl »af-- i-nr- ilmu ...... 

©Bibi. National, Codex Madrid 1 147r 


p ■ dp = — siny • dy and — = cos y + C. 

If we denote the amplitude of the oscillations by A (for which p = y' = 0) we 
have C = — cos A and get 


which is a differential equation for y. Separation of the variables finally yields the 
solution expressed in implicit form with an elliptic integral 


V2 cos rj — 2 cos A 


(the integration constant is determined by the assumption that y = 0 for t = 0). 

If T is the period of the oscillations, the maximal deviation A is attained for 
t = T/4. Hence, the period satisfies 




V 2 cos y — 2 cos A 


if 


sin 2 (j4/2) - sin 2 (y/2) 


We see that it depends on the amplitude A and is close to 2 tt if A is small (Exer- 
cise 7.5). 
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FIGURE7.8. The isochronous pendulum of Huygens 


The Isochronous Pendulum. The problem consists in modifying the standard 
pendulum in such a way that the period becomes independent of the amplitude. 
The idea of Huygens (1673, Horologium Oscillatorium ) was to modify the cir- 
cle of the standard pendulum in such a way that the accelerating force becomes 
proportional to the arc length s. The movement of the pendulum would then be 
described by 

(7.27) s" +Ks = 0, 
which has oscillations independent of the amplitude. 

Solution. We see from the two similar triangles in Fig. 7.8 (right) that the acceler- 
ating force is f = —dy/ds, so that our requirement / = —Ks becomes 

(7.28) dy = K ■ sds. 

If s = 0 for y = 0 (i.e., the origin is placed in the lowest point) we obtain by 
integration 



Thus, for our curve the height is proportional to the square of the arc length 
(Joh. Bernoulli 1691/92b, p. 489-490). Inserting s from (7.29) into (7.28) gives 

= sf^sfd^Våy 1 

Vv 

or, by taking squares, 

(7.30) — ij dy 2 = dx 2 and ^ - — - dy = dx 

with c = 1/(2 K). Apart from a shift in y, this is precisely equation (7.9) 
for the brachystochrone, and we see that the isochrone pendulum is a cycloid 
as Joh. Bernoulli (1697c) said: “animo revolvens inexpectatam illam identitatem 
Tautochronae Hugeniae nostrae que Brachystochronae” (see Fig. 7.8). 
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7.1 Compute the integral (7.5) for the tractrix with the substitution y = a cos t, 
insert sin 2 1 = 1 — cos 2 1, and apply the substitution (5.21). 

7.2 (Joh. Bernoulli 1697b). Solve the differential equation “de mon Frére” 

(7.31) y' = g(x) ■ y + f(x) ■ y n 

by using the transformation y = tA Determine the constant (3 such that 
(7.31) becomes a linear differential equation for v. 

7.3 The logistic law of population growth is given by the differential equation 
(Verhulst 1845) 

y' = by(a — y), 

where a, b are constants. Choose a = 5, b = 2 and find the solution satisfying 
2 /( 0 ) = 0 . 1 . 

7.4 Show that a differential equation of the form 


y' = G( y - 
\x 


y = - 


can be solved by the substitution v{x) = y(x)/x. Apply this method to 
_ 9x + 2 y 
2 x + y 

7.5 The solution of the pendulum equation 

y" + w 2 siny = 0, 

corresponding to initial values y{ 0) = A, y'{ 0) = 0, has the period 


T=^ A (sin 2 (A/2)-sin 2 (y/2)) 


-1/2 


dy 


(see Eq. (7.26)). Set k = sin(A/2), apply the substitution sin(y/2) = k ■ 
sin a, and compute the first terms of the expansion of T in powers of k. 
Result. v( 1+fc2 (5) 2 + fc4 (5it) 2 +---) - l!r( 1+ T 6 +] m 2 +wmo + - ■ •)• 


7.6 Solve the differential equation 


4 + y 2 
= 4 + x 2 


7.7 The motion of a body in the earth’s gravitational field is described by the 
differential equation 

„ = _9&_ 
y y 2 * 

where g = 9.81 m/sec 2 , R = 6.36 • 10 6 m, and y is the distance of the 
body to the center of earth. Determine the constants in the solution such that 
y(0) = R and y'(0) = v. Then, find the smallest velocity v for which the 
body will not return to earth (escape velocity). 
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II. 8 Linear Differential Equations 

. . . it is today quite impossible to swallow a single line of d’ Alembert, while 
most writings of Euler can still be read with delight. 

(Jacobi, see Spiess 1929, p. 139) 

Let ao(x), at (x), . . . , a n -i(x) be given functions. We call 

(8.1) t/ (n) + a n - i(æ)t/ ( " - ^ + . . . + a-t(x)y' + a 0 (x)y = 0 
a homogeneous linear differential equation of order n and 

(8.2) t/ (n) + o„_i(a;)t/ ( " _1) + . . . + a x (x)y* + a 0 (x)y = f(x) 

an inhomogeneous linear differential equation. For the left-hand side of these 
equations we introduce the abbreviation 

(8.3) C(y) : = t/ (n) + a n - 1 (x)y ( ~ n ~ 1) + . . . +a 0 (x)y, 
so that (8.1) and (8.2) become 

(8.4) C(y)= 0 and C(y) = /, 

respectively. We call £ a differential operator. It operates on functions y(x), and 
the result C(y) is again a function, given by (8.3). The main property of this oper- 
ator is that it is linear, i.e., 

(8.5) £(cit/i + c 2 t/ 2 ) = c\C(y\) + c 2 £(t/ 2 ). 

An obvious consequence of this linearity is the following result. 

(8.1) Lemma. Given n solutions yi(x), t/ 2 (x), . . . , y n {x) for the homogeneous 
equation (8.1), then for arbitrary constants c\, . . . ,c n the function 

(8.6) ci t/i (x) + c 2 t/ 2 (x) + . . . + c n y n ( x) 

is also a solution of the same equation. □ 

Remark. The solutions of the equations of order 1 involve one constant (see 
Sect. II.7) and the equations of order 2 have two arbitrary constants (see, for ex- 
ample, Eq. (7.23)). Arguing by analogy, we can assume (Euler) that the equa- 
tions of order n have n constants and that (8.6) is the general solution of 

(8.1) , if yi(x), . . . , y n (x) are linearly independent functions. Here, the functions 
yi(x ), . . . , y n (x) are called linearly independent if the linear combination (8.6) 
vanishes identically only in the case when all Cj are zero. For example, 1, x, x 2 , x 3 
are linearly independent functions. 
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(8.2) Lemma. 


General solution ofthe homogeneous equation (8.1) 

+ 

one particular solution ofthe inhomogeneous equation (8.2) 
general solution ofthe inhomogeneous equation (8.2). 


Proof. Let y be a particular solution of (8.2), i.e., £(y) = f. For an arbitrary 
solution y of (8.1) (i.e., C(y) = 0) we then have C(y + y) = f by (8.5), so that 
y + y is a solution of (8.2). 

On the other hånd, if y is another solution of (8.2) (i.e., C(y) = /) then, 
again by (8.5), we have C(y — y) = 0 and y = y + (y — y) is the sum of y and a 
solution of the homogeneous equation (8.1). □ 

Conclusion. In order to solve the differential equations (8.1) and (8.2), one has to 

- find n different solutions (linearly independent) of (8.1), and 

- find one solution of (8.2). 

Homogeneous Equation with Constant Coejficients 

The complete solution of Eq. (8.1) is very seldom possible. However, there are a 
few exceptions. The most important one is when the coefficients a,(x) are inde- 
pendent of x, i.e., 

(8.7) y (n) + a ra _it/ ( ” r ~ 1 ) + . . . + a^y' + a 0 y = 0. 

Another exception is when afx) = aiX l ~ n (“Cauchy’s Equation”). This case will 
be considered at the end of this section. 

The essential idea for solving (8.7) (Euler communicated it on Sept. 15, 1739 
in a letter to Joh. Bemoulli and published it in 1743) is to search for solutions of 
the form 

(8.8) y(x) = e Xx , 

where A is a constant to be determined. Computing the derivatives 

y'(x) = \e Xx , y"(x) = \ 2 e Xx , ... , y^ n \x) = X n e Xx , 

and inserting them into Eq. (8.7), yields 

(8.9) (A n + a„_iA" 1 + . . . + aiA + af)e Xx = 0. 

Hence, the function (8.8) is a solution of (8.7) if and only if A is a root of the 
so-called characteristic equation 

a n — iA” 1 + . . . + oiA + do- 


( 8 . 10 ) 


X(A) = 0, x(A) := A" + 
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Distinet Roots. If Eq. (8.10) has n distinet roots, say Åi, . . . , A n , then e AlX , . . 
e \ n x are n linearly independent solutions of (8.7) (see Exercise 8.1). The general 
solution is thus given by 

(8.11) y(x) = cie AlX + c 2 e A2X + . . . + c n e XnX . 


Multiple Roots. Consider first the simple differential equation 

(8.12) y (n) = 0, 

where the characteristic equation X n = 0 has a root zero of multiplicity n. Obvi- 
ously, the general solution of (8.12) is C- 1 +C 2 X+C 3 X 2 ^ . . ,+c n x n ~ 1 , apolynomial 
of degree n— 1 . 

Next, we study the equation 

(8.13) y'" - Say" + Sa 2 y' - a 3 y = 0, 

where the characteristic equation (Å — a) 3 = 0 has the root a of multiplicity 3. 
We introduce a new unknown funetion uix) by the relation (Euler 1743b) 

(8.14) y(x) = e ax • u(x). 

Then, differentiating this relation three times and inserting the results into (8.13), 
we obtain for u Eq. (8.12) with n = 3. Therefore, the general solution of (8.13) is 
given by 

(8.15) y(x) = e ax ■ (ci + c 2 x + c 3 x 2 ) . 


Differential Operators. The above calculations become particularly elegant if we 
introduce, for a given constant a, the differential operator D a by 

(8.16) D a y = y' — a ■ y. 

The composition of two such operators D a and Df, gives 

(8.17) D b D a y = ( y ' - ay)' - b(y' - ay ) = y" - (a + b)y' + aby = D a D b y. 

We observe that D a and D b commute and that D a D b D c . . . y = 0 is the differ- 
ential equation (8.7) whose coefficients are those of the characteristic polynomial 
(A — a)(Å — b)( A — c) ... . Therefore, Eq. (8.13) is the same as 


(8.130 Dly = 0. 

Applying D a to (8.14), we obtain 
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D 2 y = e ax ■ u", and finally D 3 y = e ax ■ u (3 K This verifies that (8.15) is the 
general solution of (8.13). 

(8.3) Theorem (Euler 1743b). Suppose that the characteristic polynomial (8.10) 
has the factorization 

X (A) = (A - Ar) mi (A - A 2 ) TO2 • . . . • (A - X k ) mk 
(with distinet X,), then the general solution of (8.7) is given by 

(8.18) y(x) = pi(x)e XlX + p 2 (x)e x ' 2X + . . ,+p k (x)e XkX , 

where the pi (x) are arbitrary polynomials ofdegree m, — 1 ( this solution involves 
precisely ^ i=1 m i = n constants). 

Proof. We illustrate the proof for the case of two multiple roots x(A) = (A — 
o) 3 (A — b) 4 . Because of the permutability of D a and Db, we can write the differ- 
ential equation either as 

(8.19) = 0 or as D^D^y = 0. 

The solution y = e ax ■ (ci + c%x + c 3 x 2 ) of D 3 y = 0 is seen to be reduced to zero 
by the left-hand version of (8. 19); the solution y = e bx ■ (04 + c 5 x + c e x' 2 + c 7 x :i ) 
of D^y = 0 is annuled by the right-hand version. Both are therefore solutions and 
have together seven free constants (see Exercise 8.2 for the linear independence 
of the funetions involved). □ 

Avoiding Complex Arithmetic. The result of Theorem 8.3 is valid also for com- 
plex A i. If, however, the coefficients a, of Eq. (8.7) are real, we are mainly in- 
terested in real-valued solutions. The faet that complex roots of real polynomials 
always appear in conjugate pairs allows us to simplify (8.18). Let Ai = a + i/3 
and A 2 = a — i/3 be two such roots. The corresponding part of the solution (8.18) 
is then a polynomial multiplied by 

(8.20) e ax ( Cl e i/3x + c 2 e~ iøx ). 

Using Euler’s formula (1.5.4), this expression becomes 

(8.21) e ax (d\ cos (3x + d 2 sin /3x) , 

where di = c\ + c 2 and d 2 = i (c i — c 2 ) are new constants. This expression can 
be further simplified by the use of ci 2 + id\ = Ce lv = C cos p + iC sin <p. We 
then get with Eq. (L4.3) (see Fig. 8.1) 

Ce ax ^sin (p cos f3x + cos ip sin / 3x ^ = Ce ax sin(/)a; + ip ) . 
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FIGURE8.1. Stable and unstable oscillations 


Example. Equation (7.23) of the pendulum can, for small oscillations, be simpli- 
fied by replacing sin y by y, and becomes 

(8.22) y"+u 2 y = 0, u 2 = g/£, 

where g = 9.81m/sec 2 and £ is the length of the rod. The characteristic equation 
Å 2 + u) 2 = 0 has the roots ±iu>. Hence, the general solution of (8.22) is 

y(t) = C sin(o;t + ip ), 

which has period 

T = 2t r/w = 2 -KsJJfg. 


Inhomogeneous Linear Equations 

The problem consists in tinding one particular solution of C(y) = /, i.e., 

(8.23) y {n) + a„_iy ( " _1) + . . . + a x y' + a 0 y = f(x). 

As an immediate consequence of the linearity of (8.5), we have the following 
result. 

(8.4) Lemma (Superposition Principle). Let yi(x) and 1/2(2:) be solutions of 
£(yi) = h and C(y 2 ) = fi, then ayi(x) + c 2 j/ 2 (a:) is a solution of C{y) = 
ci/i + c 2 / 2 . □ 

In situations where the inhomogeneity f(x) in (8.23) can be split into a sum 
of simple terms, the individual terms can be treated separately. 

The Quick Method (Euler 1750b). This approach is possible if f(x) is a linear 
combination of xf e ax , e ax sin(wx) ; . . .; more precisely, if f(x) itself is a solu- 
tion of some homogeneous linear equation with constant coefficients. The idea is 
to look for a solution with the same structure. 

Example. Consider a case where / is a polynomial of degree 2, e.g., 
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(8.24) y'" + hy" + 2 y' + y = 2x 2 + x. 

We will search for a solution of the form 

(8.25) y(x) = a + bx + cx 2 . 

Computing the derivatives of (8.25) and inserting them into (8.24) yields 
cx 2 + (b + 4 c)x + (a + 2b + 10c) = 2x 2 + x. 

Comparison of the coefficients gives c = 2, b = — 7 and a = —6, so that a 
particular solution of (8.24) is 

y(x) = 2a: 2 — 7x — 6. 

Example. Suppose now that f(x) is a sine function 

(8.26) y" - y’ + y = sin 2a; . 

It is not sufficient to take y(x) = a ■ sin 2a:, because y' also produces cos 2a:. 
Therefore, we put 

(8.27) y(x) = a ■ sin 2a; + b ■ cos 2x, 

compute the derivatives, and insert them into (8.26). This gives the condition 
(a + 2b — 4o) sin 2x + (6 — 2a — 46) cos 2a; = sin 2x. 

We obtain the linear system — 3o + 26 = 1, —2a — 36 = 0 with the solution 
a = —3/13, 6 = 2/13. Consequently, the particular solution is 

(8.28) y(x) = - ^ sin 2a; + ^ cos 2x. 

Another possibility for solving (8.26) is to consider the equation 

(8.29) y" — y' + y = e 2lx 

and to search for a solution of the form y(x) = Ae 2lx . Inserting its derivatives 
yields —4 A — 2iA + A = 1 and A = (—3 + 2i)/13. Hence, the solution of (8.29) 
is 

(8.30) y(x) = ^3 2l e 2lx . 

Since (8.26) is just the imaginary part of (8.29), we get a solution of (8.26) by 
taking the imaginary part of (8.30). 

Justification of This Approach. By assumption, f(x) satisfies L \ (/) = 0, where 
Ci = ... is some differential operator with constant coefficients. Apply- 

ing this operator to Eq. (8.23), i.e. C(y) = /, we get (CiC)(y) = 0, and the 
solution of (8.23) is seen to satisfy the linear homogeneous differential equation 
(CiC)(y) = 0. The general solution of this equation is known by Theorem 8.3. 
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Case of Resonance. Consider, for example, the equation 

(8.31) y" + y = smx. 

Here, we cannot take y(x) = a sin x + bcosx, because this function is itself a 
solution of the homogeneous equation. Inspired by the discussion on double roots 
(see also Fig. 8.2), we try 

(8.32) y(x) = ax sin x + bx cos x. 

The usual procedure (inserting the derivatives of (8.32) into (8.31)) yields 
2a cos x — 2b sin x = sintr, 

so that a = 0 and b = — 1/2. A particular solution of (8.31) is thus 

(8.33) y(x) = -^xcosx. 

It explodes for x —> oo (see Fig. 8.2). 

Method of Variation of Constants (Lagrange 1775, 1788). This is a general 
method that allows us to find a particular solution of (8.2) in the case where the 
general solution of the homogeneous equation (8. 1) is known. In order to simplify 
the notation, we explain this method for the case n = 2. 

Consider the problem 

(8.34) y" + a(x)y' + b(x)y = f(x) 

and assume that yi(x) and 1 / 2 ( 2 -') are two known independent solutions of the 
homogeneous equation y" -\-a{x)y' + b(x)y = 0. The idea is to look for a solution 
of the form 

(8.35) y(x) = c±(x)yi(x) + c 2 (x)y 2 (x) 

(hence the name “variation of constants”). The derivative of (8.35) is 
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(8.36) y' = ciyi + c' 2 y 2 + c x y\ + c 2 y 2 . 

In order to avoid complications with higher order derivatives, we require that 

(8.37) c\yi + c 2 y 2 = 0 

so that the derivative of (8.35) becomes y' = C\ y\ + c 2 y 2 . The second derivative 
then becomes 


(8.38) y" = c[ y[ + c 2 y 2 + c\y'{ + c 2 y 2 . 

If all these formulas are inserted into (8.34), the terms containing ci and c 2 disap- 
pear, because we have assumed that yi(x) and y 2 {x) are solutions of the homoge- 
neous equation. All that remains is 

(8.39) 4 2 /i +c 2 y 2 = f(x). 


This, together with (8.37), constitutes the linear system 


(8.40) 


fyi(x) 2/2 (æ)\ fci(a;)\ 
\y'i(x) y 2 (x) ) \d 2 (x)J 

W(x) c \x) 



The matrix W ( x ) is called the Wronskian. Computing c'{x) from (8.40) and inte- 
grating yields 

c(x) = [ W~ 1 (t)F(t) dt, 

Jo 

and a solution of (8.34) is given by 

(8.41) y(x) = (t/i (ar), y 2 (x)) = J Q M x )> y2{x))W~ 1 (t)F(t) dt. 

Example. Consider the equation with constant coefficients 


(8.42) 


y" + 2ay' + by = f(x), 


where a 2 < b. The homogeneous equation possesses the solutions yi(x) = 
e (a+i;3)X' y, 2 ( x ) = e (a-i!3) x y w here a = — o and ft = Vb — a 2 . The Wronskian 
and its inverse are 


W(x) = e ax 
IT '(x) = e — 


e iøx e~ i/}x \ 

0 a + ift)e iøx {a - ift)e~ i/3x ) 
/ (-a + ift)e~ if}x e~ iøx \ 

V {a + ift)e^ x -e i/3x )' 


Consequently, we find from (8.41) that 


(8.43) 



!> 

S> 


2i 

*(æ— t) s i dt. 


)m 


dt 


This formula is valid for any function f(t). 
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Cauchy’s Equation 

An equation of the form 

(8.44) + . . . + <l ' y' + —y = 0 

X x n 

is usually called “Cauchy’s equation”. Its analytic solution was discussed in full 
detail by Euler (1769, “Sectio Secunda, Caput V”). Instead of e Xx , one looks for 
solutions of the form 

(8.45) y(x) = x r . 


Example. Consider the problem 

(8.46) y" + A -\f - -^y = 0. 

Inserting (8.45) yields 

( r(r -i) + r - l)x r ~ 2 = 0. 

The roots of this equation are r = 1 and r = — 1. Hence, the general solution of 

(8.46) is 

(8.47) y(x) = cix + —. 

x 

Another possibility for solving (8.44) is the use of the transformation 

(8.48) x = e‘, y(x) = z{t). 


Since 

(8.49) 


Eq. (8.46) becomes an equation with constant coefficients z" — z = 0, to which 
we can apply the above theory (Theorem 8.3). This gives z(t) = cie* + C 2 e~ t , 
which, after back substitution, becomes (8.47) again. 


Exercises 

8.1 If Åi , . . . , X n are distinet complex numbers, then 

(8.50) c\e XlX + c 2 e A2X + . . . + c n e XnX = 0 

for all x if and only if c-[ = c 2 = . . . = c n = 0. 

Hint. Differentiating Eq. (8.50) at x = 0 shows that Yh = o c i^l = 0 f° r 
k = 0, 1, ... . Consider then the expression where p(x) is a 

polynomial that vanishes for Ai, . . . , Aj_i, Aj + i, . . . , A„ but not for A j. 
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8.2 For distinet values Ai , . . . , A n we have 

+ diX + e,x 2 )e A<æ = 0 

for all x if and only if all coefficients c, . dj. , e t vanish. 

Hint. Prove that for an arbitray polynomial we have 

EILi (ciP(Ai) + dip'(Xi) + eip"(Åi)) = 0. 

8.3 A second access to the case of multiple characteristic values (d’Alembert 
1748). Suppose that A is a double root of (8.10). Split this root into two neigh- 
boring roots A and A + e (with e infinitely small). In this case, e Xx , e( A+£ ) æ , 
and also the linear combination 

e (A+e)x _ „Ax 
y(x) = t -J-. 

are solutions of the problem. Show that the latter becomes, for e — > 0, the 
solution xe Xx . 

8.4 Look for a particular solution of y" + 0.2 y' + y = sinftex) and study its 
amplitude as funetion of ui. What phenomenon can be observed? 

8.5 Compute a particular solution of y" — 2y' + y = e x cos x 

a) by putting y = Ae x sin x + Be x cos x; 

b) by the method of variation of constants; and 

c) by solving y" — 2y' + y = f/ 1 : " A 

8.6 Solve the following homogeneous and inhomogeneous Cauchy equations: 

x 2 y" - xy' - 3y = 0, 
x 2 y” — xy' — 3 y = x 4 , 
x 2 y" - 3 xy' + Ay = 0. 

The last equation will lead to a problem of double roots. Meet the situation 
with determination (Laurel & Hardy 1933, The Sons of the Desert). 

8.7 Let j/i (x) and y-iix) be two solutions of y" + a(x)y' + b(x)y = 0. Then, 
show that the Wronskian (8.40) satisfies 

det^LF(a;)^ =det^IL(xo)^ • exp J a(t)dt 

Hint. Find a differential equation for z(x) = det (IL (a:)) . 



154 II. Differential and Integral Calculus 

II.9 Numerical Solution of Differential Equations 

I have always observed that graduate mathematicians and physicists are 
very well acquainted with theoretical results, but have no knowledge of the 
simplest approximate methods. 

(L. Collatz, Num. Beh. Diffgl., Springer 1951, Engl. transi. 1960) 
It is often impossible to solve a differential equation 

(9.1) y' = f(x,y) 

by analytic methods (e.g., y' = x 2 + y 2 ). If it is possible, it may happen that the 
integrals that appear are not elementary (e.g., y" + siny = 0, see (7.23)). Even 
in the case where all integrals are elementary, the formulas obtained might not be 
useful. For example, the solution of y' = y 4 + 1 is given by (see Eq. (5.16)) 

^ ln y 2 + | ^ (arctan(y\/2 + 1) + arctan(y\/2 - 1)) = x + C, 

which is a rather unpractical formula, especially if we want y as a function of x. 
Therefore, it is interesting to search for numerical methods that treat (9.1) directly. 

Euler’s Method 

PROBLEM 85: Given an arbitrary differential equation, find for its integral 
a close approximation. (Euler 1768, §650) 

Equation (9.1) prescribes for each point ( x,y ) a value f(x,y) that is the slope of 
the solution. One can thus imagine a field of directions (Joh. Bernoulli 1694). The 
curves that always follow these directions are the solutions of (9.1). See Fig. 9.1 
for the “Exemplo res patebit” (called Riccati’s equation) 

(9.2) y’ = x 2 + y 2 , 

which does not possess an elementary solution (Liouville 1841, “J’ai done pensé 
qu’il pouvait étre bon de soumettre la question å une analyse exacte . . .”). Obvi- 
ously, the solutions are not unique. Therefore, we prescribe an initial value 

(9.3) y(x o) = y 0 . 

Euler’s Idea (Euler 1768, Sectio Secunda, Caput VII). We choose h > 0 and we 
replace the solution for xo < x < xq + h by its tangent line 

#§ = yo + (x- x 0 ) ■ f(x 0 ,yo)- 

For the point xi = x 0 + h this gives yi = yo + hf(x o, yo). At this point we 
compute again the new direction and repeat the above procedure in order to obtain 
the “valores successivi” 

(9.4) 
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FIGURE9.1. Prescribed slopes for y' = x 


■ with four solutions 


This is Euler’s method. The function that is obtained by connecting all these tan- 
gents is called Euler’s polygon. If we let these polygons approach the 

solution more and more closely (see Fig. 9.2). 


Numerical Experiment. We consider the differential equation (9.2), choose the ini- 
tial values xq = —1.5, t/o = —1-4, and the step sizes h = 1/4, 1/8, 1/16, 1/32. 
The resulting Euler polygons are plotted in Fig. 9.2. The numerical approximation 
and the errors at x = 0 are shown in Table 9.1. We observe that the error decreases 
by a factor of 2 whenever the step size is halved (“quot” denotes the quotient be- 
tween the errors for two successive step sizes). An explanation of this faet can 
be found in any textbook on numerical analysis (e.g., Flairer, Nørsett, & Wanner 
1993, Sect.II.3,p. 159). 


TABLE 9.1. Euler’s method TABLE 9.2. Method (9.5) 


l/h 

2/(0) 

error 

quot 

4 

0.7246051 

-0.6762019 


8 

0.2968225 

-0.2484192 

2.722 

16 

0.1577289 

-0.1093256 

2.272 

32 

0.0999576 

-0.0515543 

2.121 

64 

0.0734660 

-0.0250628 

2.057 

128 

0.0607632 

-0.0123599 

2.028 

256 

0.0545412 

-0.0061380 

2.014 

512 

0.0514618 

-0.0030586 

2.007 


l/h 

2/(0) 

error 

quot 

2 

-0.7330279 

0.7814312 


4 

-0.1063739 

0.1547771 

5.049 

8 

0.0153874 

0.0330159 

4.688 

16 

0.0409854 

0.0074179 

4.451 

32 

0.0466509 

0.0017523 

4.233 

64 

0.0479776 

0.0004257 

4.116 

128 

0.0482984 

0.0001049 

4.058 

256 

0.0483772 

0.0000260 

4.029 
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FIGURE 9.2. Polygons for y' = x 2 + y 2 FIGURE 9.3. Parabolas of order 2 


Taylor Series Method 

PROBLEM 86: Improve significantly the above method of approximate 
integration of differential equations, so that the result be doser to the truth. 

(Euler 1768, §656) 

We note that (9.4) represents the first two terms of Taylor’s series. In order to 
improve the precision, let us use three terms so that 

h 2 

(9.5) y n + 1 = y n + hy' n + — t/". 

We have y' n = f(x n ,y n ), and for the computation of y” we simply differentiate 
the differential equations with respect to x. This gives, for y' = x 2 + y 2 , 

(9.6) y" = 2x + 2 yy' = 2x + 2 x 2 y + 2 y 3 . 

The numerical results obtained by (9.5) with h = 1/2, 1/4, 1/8, and 1/16 are 
shown in Fig. 9.3. We have replaced the polygons of Euler’s method by “poly- 
parabolas” composed of the truncated Taylor series. The errors at x = 0 are 
presented in Table 9.2. For small h the results are much better than for Euler’s 
method; halving the step size divides the error by 4. 

Remark. It is of course possible to take additional terms of the Taylor series into 
account, e.g., 

h 2 h 3 

(9.7) y n + 1 = Vn + hy' n + — y" + — y"'. 

The higher derivatives are obtained by iterated differentiation of the differential 
equation. For Riccati’s equation we obtain from (9.6) 
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cal Solution of Differential Equa 



FIGURE9.6. Numerical solutions for the pendulum (9.8') 
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y'" = 2 + 2 y'y' + 2 yy" = 2 + 4 xy + 2x 4 + 8 x 2 y 2 + 6t/ 4 
y"" =4 y + 12x 3 + 20 xy 2 + 16 x 4 y + 40 x 2 y 3 + 24 y 5 , etc. 


Second-Order Equations 

Consider, for example, the pendulum equation (7.23) 

(9.8) y" == sin y. 

We introduce a new variable for y' so that (9.8) becomes 



This system can be interpreted as a vector field, which prescribes at each point 
(y,v) a velocity of the point (y(x), v(x)) moving with x (Fig. 9.4). The solu- 
tions (y(x),v(x)) constantly respect the prescribed velocity. They are sketched in 
Fig. 9.5. The ovals represent the oscillations; the sinusoids are the rotations of a 
pendulum that tums over. 

Euler’s Method. The idea (Cauchy 1824) is to apply Euler’s method (9.4) to both 
functions y{x) and v(x). If y(x o) = yo and v(xo) = vq are given initial values 
and h > 0 is a chosen step size, the analog of (9.4) applied to (9.8') is 

(9.9) x n+ \ = x n + h, y n+ 1 = y n + h ■ v n , v n+1 = v n — h ■ sin(j/ n ). 

Fig. 9.6 shows Euler’s polygons for the initial values y{ 0) = 1.2, u(0) = 0, and 
for h = 0.15. We observe that our tremendous method predicts that the pendulum, 
in contrast to physical reality, accelerates and finally tums over. 

Taylor Series Method. Differentiating (9.8') with respect to x, we obtain 

(9.10) y" = v' = — siny, v" = — cos y ■ y' = — cos y ■ v, 


which allow us to use an additional term of the Taylor series. The analog of 
Eq. (9.5) becomes 


(9.11) 


Vn +1 =y n + hy' n + — y” = y n + hv n - — sin y n 

i r h 2 „ h 2 

V n +1 =v n + hv n + -v n = v n -h sin (y„) - — cos y n ■ v n . 


The results (see Fig. 9.6 to the right) are much better even for h twice as large. 
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Exercises 

9.1 Apply the method of Euler with h = 1/N to the equation 

y' = A y, 3/(0) = 1 

in order to obtain an approximation of 3/(1) = e\ The result is a well-known 
formula of Chap. I. 

9.2 (Inverse Error Function). Define a function y(x) by the relation 



Differentiate this formula and show that y{x) satisfies the differential equa- 
tion 

y' = ^L e y\ 3 /( 0 ) = 0 . 

Compute the first four terms of the Taylor series for y(x) (developed at the 
point x = 0). 

9.3 (Van der Pol’s Equation). Compute j/W and 
:«;W for i = 1, 2, 3 for the solutions of the 
differential equation 


v' =e{l -y 2 )v-y, 

and compute numerically the solution us- 
ing the third-order Taylor series method 
for e = 0.3, the initial values 3/(0) = 
2.00092238555422, v{0) = 0, and for 0 < 
x < 6.31844320345412. The correct so- 
lution is periodic for this interval and the 
given initial values. 
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II. 10 The Euler-Maclaurin Summation Formula 

The King calls me “my Professor”, and I am the happiest man in the world! 

(Euler is proud to serve Frederick II in Berlin) 
I have here a geometer who is a big cyclops . . . who has only one eye left, 
and a new curve, which he is presently computing, could render him totally 
blind. (Frederick II; see Spiess 1929, p. 165-166.) 

This formula was developed independently by Euler (1736) and Maclaurin (1742) 
as a powerful tool for the computation of sums such as the harmonic sum 1 
! + ! + ...+■ i* the sum of logarithms ln 2 + ln 3 + ln 4 + . . . + ln n = ln ni, 
the sum of powers l k + 2 k + 3 k + . . . + n k , or the sum of reciprocal powers 
l + ^r + p- + -- - + ^r> with the help of differential calculus. 

Problem. For a given function f(x), find a formula for 
(10.1) 5 = /( 1) + /(2) + /( 3) + . . . + f(n) = f{i) 


(“investigatio summae serierum ex termino generali”). 


Euler’ s Derivation of the Formula 


The first idea (see Euler 1755, pars posterior, § 105, Maclaurin 1742, Book II, 
Chap. IV, p. 663f) is to consider also the sum with shifted arguments 

(io.2) s = m + m + m + ■■■+/(«- o- 


We compute the difference S— s using Taylor’s series (Eq. (2.8) with x — xq = — 1) 


and find 




2! 3! 


/(») - m ‘t/v-bt/v+ht, /"'<*■> - 1 , É /""« + ■ ■ ■ 


In order to tum this formula for f'(i) into a formula for ff, /(*), we replace / 
by its primitive (again denoted by /): 

(10.3) 


J2 /(*) = J o /(*) dx + É /'(*)- li É /"'(# + 1 É 


The second idea is to remove the sums f'"’ 011 th e right by using 

the same formula, with / successively replaced by f, f", f" etc. This will lead 
to a formula of the type 
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, )n , £ m = r m dx - a(f(n) - /(O)) + /?(/'(«) - /'(O)) 

(10.4) Jo 

~ 7 ~ /"(O)) +<5(/'"(n) - /"'(O)) 

For the computation of the coefficients a, p, 7, . . . we successively replace / in 
(10.4) by /', /", ... to obtain 

E m = Io n /(*) dx —a(f(ri) - /(O)) +/?(/» - /'(O)) - ■ • ■ 

-ér E m = ~h (/(«) - /(O)) +§(/» - /'(O)) - • • • 

|E/"(*1 =+i(/'(n)-/'(0))-... 


The sum of all this, by (10.3), has to be f(x) dx. Therefore, we obtain 


from which we can compute a = — 5, /3 = yrj, 7 = 0, 6 = —750? • • • and we 
have 


É/w =f"mdx+ ^ (/'(»)-/'( 0)) 

- dfj (/"' (»)-/"' (0!) + ^ (/ (5 >(»)-/ |5 >(o)) + . 


(10.1) Example. This formula, applied to a sum of nearly a million terms, 


1 1 

TT + 12 


13 

1 

1200 ” 


1000000 

120 10 + 252 1 


gives an excellent approximation of the exact result by a couple of terms only. The 
formula is, however, of no use for the computation of the first terms 1 + ^ + . . . + ^ . 

Bernoulli Numbers. It is customary to replace the coefficients a, fJ, 7, . . . by 
Bi/i\ (B 0 = l,a = Bi/1!, /3 = B 2 / 21, . . .), so that (10.5) becomes 


(10.50 2£i + B 0 = 0, 3B 2 + 3B X + B 0 = 0, 



= 0. 


The Bernoulli numbers, as far as Euler calculated them, are 
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B 0 — 1 , Bi = — , B2 = — , B4 = , 


Sl0 = 66 ’ Bl2 = ~273 
_ 174611 

B20 ~ 330 - ’ B22 

„ 8553103 


3617 


Sl6 = “W Bl8 = 

„ 236364091 


43867 
798 ’ 


23749461029 


2730 ’ 

8615841276005 


6 ’ 870 ’ 14322 

and -B 3 = B 5 = . . . = 0. In this notation, Eq. (10.6) becomes 


(10.60 



Example. For f(x) = x q the series of Eq. (10.60 is finite and gives the well-known 
formula of Jac. Bernoulli (1. 1 .28), (1. 1 .29). 


Generating Function. In order to get more insight into the Bernoulli numbers, 
we apply one of Euler’s great ideas: consider the function V (u) whose Taylor 
coefficients are the numbers under consideration, i.e., define 


(10.7) 


V (u) = 1 + au + /3u 2 + 7 u 3 + 8u a + . . . 


£4 4 


Now the formulas (10.5) alias (10.50 say simply that 


that is, 


V(u)- 


• 77T + -77 + -.- = 1- 


V(U) = - 


Thus, the infinitely many algebraic equations become one analytic formula. The 
faet that 


(10.9) 
is an even 



function shows that B3 = B$ = B? = . 


e“/ 2 + e _ “/ 2 

e u/2 _ e -u/2 

.. = 0 . 
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De Usu Legitimo Formulae Summatoriae Maclaurinianae 

We now insert f(x') = cos(27ra;), for which f(i ) = 1 for all i, into Eq. (10.60- 
This gives 1 + 1 + . . . + 1 to the left, and 0 + 0 + 0 + . . . to the right, because 
cos(27rx) together with all its derivatives is periodic with period 1. We see that the 
formula as it stands is wrongl Another problem is that for most functions / the 
infinite series in (10. 6') usually does not converge. 

It is therefore necessary to truncate the formula after a finite number of terms 
and to obtain an expression for the remainder. This was done in beautiful Latin 
(see above) by Jacobi (1834) by rearranging Euler’s proof using the error term 
(4.32) of Bernoulli-Cauchy throughout. It was later discovered (Wirtinger 1902) 
that the proof can be done simply by repeated integration by parts in a similar 
manner to the proof of Eq. (4.32). The main ingredient of the proof is the so-called 
Bernoulli polynomials. 

Bernoulli Polynomials. The polynomials 

.Bi ( x ) = B 0 x + B 1 = x - \ 

B2{x ) = BqX 2 + 2B\X + i?2 = X 2 — X + jr 

B-$(x) = -Boa; 3 + 3Bix 2 + 3 B 2 X + B3 = a; 3 — |x 2 + ^ x 

Bi(x) = B 0 a; 4 + 4Bia; 3 + 6 B 2 x 2 + 4B 3 a; + B 4 = a; 4 — 2a; 3 + x 2 — , 

or, in general, 

(10.10) B k {x) = ^ Q B iX k ~\ 
satisfy 

(10.11) B' k (x) = kB k ~i(x), Bfc(0) = B k (l) = B k (k > 2). 

Indeed, the first formula of (10.1 1) is a property of the binomial coefficients (see 
Theorem 1.2.1); the second formula follows from the definition and from (10.50. 

(10.2) Theorem. We have 

/(*) = i f(x)dx+ ^(/(n) - /(0)) 

+ É ^ (/° °(«) - / (i - 1} (0)) + R k , 

where 

(10.12) R k = 1 J H B k {x)fW(x)dx. 

Here, B k (x) is equal to B k (x) for 0 < x < 1 and extended periodically with 
period 1 (see Fig. 10.1). 
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FIGURE 10.1. Bernoulli polynomials 


Proof We start by proving the statement for n = 1. Using B[ (x) = 1 and inte- 
grating by parts we have 

J f(x) dx = J B[(x)f(x)dx = B 1 (x)f(x)^-j B 1 (x)f'(x)dx. 

The first term is |(/(1) + /(O)). In the second term we insert from (10.11) 
Bi(x) = ^B' 2 (x) and integrate once again. This gives 

J q = + + ^ B 2 (x)f"{x)dx 

or, continuing like this, 

(10.13) 

\ (/(l) + /(O)) = m dx + it ^ 7 r 1 (/° (!) - f U ~ 1] (0)) + Rk, 

with 

(10.14) R k = ' j\ k (x)f {k \x)dx. 

We next apply Eq. (10.14) to the shifted functions f{x + i — 1), observe that 

J B k (x)f^\x + i-l)dx = j B k (x)f w (x)dx, 

and obtain the statement of Theorem 10.2 by summing these formulas from i = 1 
to i = n. □ 
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Estimating the Remainder. The estimates (for 0 < x < 1) 

\B^x)\< 1 -, mz)\-S±, \Bs(x)\<^, |B 4 (a;)|<l 

which are easy to check, and the faet that | /'" g(x) dx\ < JJ* \g(x) \ dx, show that 

(10.15) l-Ri | < ^ \f'( x )\ dx, \R 2 \<±£\f"( x )\dx, .... 

These are the desired rigorous estimates of the remainder of Euler-Maclaurin’s 
summation formula. Further maximal and minimal values of the Bernoulli poly- 
nomials have been computed by Lehmer (1940); see Exercise 10.3. 

(10.3) Remark. If we apply the formula of Theorem 10.2 to the function f(t) = 
hg(a + th ) with h= (b — a)/n and if we pass the term (/(n) - /(0)) /2 to the 
left side, we obtain (with Xi = a + ih) 

^g(x 0 ) + h^2g(xi) + ^g(x n ) = J g{x)dx 

(10.16) + E tt % (V j_1) ( fe ) - gV-'H*)) 

j = 2 3 ' 

uk + 1 pn _ 

+ { ~ 1)k ~ 1 ^T J B k{t)g {k) {a + th)dt , 

where we recognize on the left the trapezoidal rule. Equation (10.16) shows that 
the dominating term of the error is (h 2 / 12) (g* (fy — g'{a)). However, if g is peri- 
odic, then all terms in the Euler-Maclaurin series disappear and the error is equal 
to Rk for an arbitrary k; this explains the surprisingly good results of Table 6.2 
(Sect.II.6). 


Stirling’s Formula 

We put f(x) = ln x in the Euler-Maclaurin formula. Since 


f(i) = ln2 + ln3 + ln4 + ln5 + . . . + ln n = ln (n!), 


we will obtain an approximate expression for the factorials n\ = 1 • 2 • 
(10.4) Theorem (Stirling 1730). We have 


(10.17) 


/ 1 1 1 i 

\12n 360 n 3 1260n 5 1680n 7 /’ 


• exp| 
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where |i? 9 | < 0. 0006605 /n 8 . This gives, for n — > oo, the approximation 


(10.18) 



Remark. This famous formula is especially useful in combinatorial analysis, statis- 
tics, and probability theory. Equation (10. 17) is truncated after the 4th term simply 
because one additional term would not fit into the same line. 

The numerical values of (10.18) and (10.17) (with one, two and three terms) 
for n = 10 and n = 100 are compared to n! in Table 10.1. 


TAB LE 10.1. Factorial function and approximations by Stirling’s formula 


n = 10 : Stirling 0 = 0.35^869561874103592162317593283 • 10 7 
Stirling 1 = 0.3628^1005142693352994116531675 • 10 7 
Stirling 2 = 0.36287999(7141301292538591223941 • 10 7 
Stirling 3 = 0.3628800001)21301281279077612862 • 10 7 
n! = 0.362880000000000000000000000000 • 10 7 

n = 100 : Stirling 0 = 0.93^484762526934324776475612718- 10 158 
Stirling 1 = 0.93326215(7031762340989619195146 • 10 158 
Stirling 2 = 0.93326215443^367463946383356624- 10 158 
Stirling 3 = 0.933262154439441^32371338864918 • 10 158 
n! = 0.933262154439441526816992388563- 10 158 


Proof. We have seen above (Example 10.1) that the Euler-Maclaurin formula is 
inefficient if the higher derivatives of f(x) become large on the considered inter- 
val. We therefore apply the formula with f(x) = lnx for the sum from i = n+1 
to * = m. Since 

J lnxdx = xkix — X, |L (l » x)= (-!),-> (L_a, 
we obtain from Theorem 10.2 that 


f(i) = ln to! — ln n! = m ln m — m — (n ln n — n) + — (ln to — ln n) 


where |7?s| < 0.00123/n 4 for all m > n. This estimate is obtained from (10.12) 
and (10.15) and the faet that \B 5 (x)\ < 0.02446 for 0 < x < 1. In (10.19), 
the terms ln n\, nlnn, n, and (1/2) ln n diverge individually for n — > oo. We 
therefore take them together and set 
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( 10 . 20 ) 


7n = In nl + n- (n + ^ ln n, 


and (10.19) becomes 


( 10 . 21 ) 


12 \n m) 360 \n 3 m 3 ) 


For n and m sufficiently large -y n and 7 m become arbitrarily close. Therefore, it 
appears that the values 7 m converge, for m — > oo, to a value that we denote by 7 
(the precise proof will be given in Theorem III. 1.8 of Cauchy). We then take the 
limit m — > 00 in Eq. (10.21) and obtain 


(„ + l) ln „ = 7+ _F__L_ + R s , 

where |i? 5 | < 0.00123/n 4 . Taking the exponential function of this expression we 
get 


„0.22) n! = witfa ■ e*p(jk - ^ + ft,). 

This proves (10.18) and also (10.17), as soon as we have seen that the limit of D n 
(i.e., D = e 7 ) is actually equal to \f2iv. To this end, we compute, from (10.22), 

D n ■ D n _ ni ■ n! • (2n) 2 ” • e~ 2n V2 n _ 2 • 4 • 6 • 8 • . . . ■ 2 n y/2 

D 2n ~ n 2n ■ e~ 2n ■ n ■ (2 n)\ ~ 1 • 3 • 5 • 7 • . . . • (2n - 1) ' ~/Jn ’ 

which tends to D too. This formula reminds us of Wallis’s product of Eq. (1.5.27). 
Indeed, its square, 

/ZVAa 2_ 2 • 2 ■ 4 ■ 4 • 6 • 6 ••• (2n)(2n) 2(2n+l) 

V D 2n / “ 1-3-3- 5-5-7 ••• (2n-l)(2n+l) ‘ n 


tends to 27r, so that D = y/2n. The stated estimate for Rq follows from (10.12) 
and | . 59 ( 2 :) | < 0.04756. □ 


The Harmonic Series and Euler’s Constant 

We try to compute 

111 1 
1+ 2 + 3 + 4 + -'- + n 

by putting f(x) = l/x in Theorem 10.2. Since f^Hx) = (-1 we get, 

instead of (10.19), 
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■■un. epø mer mc/ArJum, 

gl vvv u}&'fe*j4' Å<^uiy£r,e<p£^unru^ ,-> ± -r+- -*^ -*£ -+ 

-= Crp -h <A ■* ^ C.-7.6jc e '" + *••>• 

, *91 _ qj- -, _, ^ <«**• 

„,*.*,„2 2* ’* ^ •= "• ^ 

%£‘^2Z r &t> 

O, S- 7 Til S-ff-q-i ) kJJ“52^^ 'ff? M 


n /errni'rt^n* /-+ -fc •+%"*" ' 

„ l*«0 2?2^722K^''72 3 2 

y /• 



*4+*-x cLtcJem. 

FIGURE 10.2. Euler’s autograph (letter to Joh. Bemoulli 1740, see Fellmann 1983, p. 96) 1 



where, because of |^g(æ)| < 0.04756, we have |i2g| < 0.00529/n 9 . The diverg- 
ing terms to collect will now be, instead of (10.20), 

7» = £ 

which is investigated precisely as above and seen to converge. This time, the con- 
stant obtained, 

(10.24) l + i + i + ... + ^-lnn-^7 = 0.57721566490153286 

is a new constant in mathematics and is called “Euler’s constant” (see Fig. 10.2 
for an autograph of Euler containing his constant and its use for the computation 
of the sum of Example 10.1). Letting, as before, m — > oo in (10.23), we obtain 

CLO.2 5 ) gi = 7 + ln n + ^-^ + j^-^ + ^ + Rs, 

where |i? 9 | < 0.00529/n 9 . To find the constant 7, we put, for example, n = 10 
(as did Euler) in Eq. (10.25) and obtain the value of (10.24). This constant was 
computed with great precision by D. Knuth (1962). It is still not known whether it 
is rational or irrational. 


Reproduced with permission of Birkhaeuser Verlag, Basel. 
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Exercises 


10.1 The spiral of Theodorus is composed of rectangular triangles of sides 1, ^/n, 
and \/n + 1 . It performs a complete rotation after 17 triangles (this seems to 
be the reason why Theodorus did not consider roots beyond \Zl7 ). No longer 
prevent ed by such scruples, we now want to know how many rotations a 
billion such triangles perform. This requires the calculation of (see Fig. 10.3) 


1000000000 


1 

h 27T 51 


V 


with an error smaller than 1. This exercise is not only a further occasion to 
admire the power of the Euler-Maclaurin formula, but also leaves us with an 
interesting integral to evaluate. 



FIGURE 10.3. The spiral of Theodorus of Cyrene, 470-390 B.C. 


10.2 (Formula for the Taylor series of tan te). If we let cot te = 1/ tante and 
coth x = 1/ tanhtr, Eq. (10.9) can be seen to represent the Taylor series of 
(tc/2) coth(tc/2). This allows us to obtain the series expansion of te ■ coth te, 
and, by letting x i— > ix, that of x ■ cot x. Finally, use the formula 

2 • cot 2x = cot x — tan x 

and obtain the coefficients of the expansion of tante. Compare it with 
Eq. (1.4.18). 

10.3 Verify numerically the estimates (Lehmer 1940) 

|B 3 (®)| < 0.04812, \B 5 {x)\ < 0.02446, |B 7 (®)| < 0.02607, 

|S 9 (®)| < 0.04756, |Bn(s)| < 0.13250, |Si 3 (®)| < 0.52357 

forO < x < 1. 




III 

Foundations of Classical Analysis 


... I am not sure that I shall still do geometry ten years from now. I also 
think that the mine is already almost too deep, and must sooner or later be 
abandoned. Today, Physics and Chemistry offer more brilliant discoveries 
and which are easier to exploit . . . 

(Lagrange, Sept. 21, 1781, Letter to d’Alembert, Oeuvres, vol. 13, p. 368) 
Euler’s death in 1783 was followed by a period of stagnation in mathematics. He 
had indeed solved everything: an unsurpassed treatment of infinite and differential 
calculus (Euler 1748, 1755), solvable integrals solved, solvable differential equa- 
tions solved (Euler 1768, 1769), the secrets of liquids (Euler 1755b), of mechan- 
ics (Euler 1736b, Lagrange 1788), of variational calculus (Euler 1744), of algebra 
(Euler 1770), unveiled. It seemed that no other task remained than to study about 
30,000 pages of Euler’s work. 

The “Théorie des fonetions analytiques” by Lagrange (1797), “freed from 
all considerations of infinitely small quantities, vanishing quantities, limits and 
fluxions”, the thesis of Gauss (1799) on the “Fundamental Theorem of Algebra” 
and the study of the convergence of the hypergeometric series (Gauss 1812) mark 
the beginning of a new era. 

Bolzano points out that Gauss’s first proof is lacking in rigor; he then gives 
in 1817 a “purely analytic proof of the theorem, that between two values which 
produce opposite signs, there exists at least one root of the equation” (Theorem 
III. 3. 5 below). In 1821, Cauchy establishes new requirements of rigor in his fa- 
mous “Cours d’ Analyse”. The questions are the following: 

- What is a derivative really? Answer: a limit. 

- What is an integral really? Answer: a limit. 

- What is an infinite series di + <22 + 03 + . . . really? Answer: a limit. 



This leads to 

- What is a limit? Answer: a number. 


And, finally, the last question: 

- What is a number? 

Weierstrass and his collaborators (Heine, Cantor), as well as Méray, answer 
that question around 1870-1872. They also till many gaps in Cauchy’s proofs 
by clarifying the notions of uniform convergence (see picture below), uniform 
continuity, the term by term integration of infinite series, and the term by term 
differentiation of infinite series. 

Sections III. 5, III.6, and III.7, on, respectively, the integral calculus, the dif- 
ferential calculus, and infinite power series, will be the heart of this chapter. The 
preparatory Sections III. 1 through III.4 will enable us to build our theories on a 
solid foundation. Section III.8 completes the integral calculus and Section III.9 
presents two results of Weierstrass on continuous functions that were both spec- 
tacular discoveries of the epoch. 



Weierstrass explains uniform convergence to Cauchy 
who meditates over Abel’s counterexample 
(Drawing by K. Wanner) 
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III. 1 Infinite Sequences and Real Numbers 

If, for every positive integer n, we have given a number s n , then we speak of an 
( infinite ) sequence and we write 

(1.1) {«n} = {S1,S 2 ,S3,S4,S5,---}- 

The number s n is called the nth term or the general term of the sequence. 

A first example is 


( 1 . 2 ) { 1 , 2 , 3 , 4 , 5 , 6 ,...}, 

which is an arithmetic progression. This means that the difference of two succes- 
sive terms is constant. The sequence 

(1.3) {q°,q\q 2 ,q 3 ,q\q 5 ,...} 

is a geometric progression (the quotient of two successive terms is constant). 

Convergence ofa Sequence 

One says that a quantity is the limit of another quantity, if the second ap- 
proaches the first doser than any given quantity, however small . . . 

(D’Alembert 1765, Encyclopédie, tome neuvieme, å Neufchastel.) 
When a variable quantity converges towards a fixed limit, it is often useful 
to indicate this limit by a specific notation, which we shall do by setting the 
abbreviation 

in front of the variable in question . . . 

(Cauchy 1821, Cours d’ Analyse) 
If the terms s n of a sequence (1.1) approach arbitrarily closely a number s for n 
large enough, we call this number the limit of ( 1 . 1 ). This concept is very important 
and calls for more precision: 

- “arbitrarily closely” means “doser than any positive number e”, i.e., |s„ — ,s < 
£. Here, | • | is the absolute value and forces s n to be close to s in the positive 
and the negative direction. 

- “for n large enough” means that there must be an N such that the above esti- 
mate is true for all n> N. 

With the symbols V (“for all”) and 3 (“there exists”), we can thus express the 
above situation in the following compact form. 

(1.1) Definition (D’Alembert 1765, Cauchy 1821). We say that a sequence (1.1) 
converges if there exists a number s such that 


Ve>0 31V> 1 \/n>N J**-#f<e. 


We then 




III. 1 Infinite Sequences and Real Numbers 173 


i m, 


' 1 -f Y Vf e“ 5 

-\tj 

% : 

i N 

FIGURE 1.1. Convergence of the sequence (1.6) 


(1.5) «= lim or s n ^s. 

If(lA) is not true for any s, the sequence (1.1) is said to diverge. 


(1.2) Examples. Consider the sequence 


(H 3 4 5 1 

l 2’ 3’ 4’ 5’ 6’ ' 7’ 
This sequence converges to 1, because 


where 


for l/(n + 1) < e, hence for n > 1/e — 1. Therefore, for a given e > 0, we can 
take for N an integer that is larger than 1/e — 1 and condition (1.4) is verified. 

As the next example, we choose the sequence 




(here [i/2] denotes the largest integer k not exceeding i/2; i.e., [i/2] = k if i = 
2k or i = 2 k + 1). This sequence is somewhat less trivial and is illustrated in 
Fig. 1.1. It seems to converge to a number close to 1.13 (which we guess, after 
our experience of Chap. I, to be 7r/4 + ln 2/2). We observe that for a given s 
(here e = 0.058), there is a last s n (here si6) violating s n - s\ < e. Hence, 
for N = 17, (1.4) is satisfied. The faet that several earlier terms (53, .s- 5 , . . .) also 
satisfy this estimate does not contradict (1.4). 
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(1.3) Theorem. Ifa sequence 04 converges, then it is bounded. Le., 

(1.7) 3 B Vn> 1 |s„| <B. 

Proof. We put s = 1. By the definition of convergence, we know the existence of 
an integer N such that |s ra — s\ < 1 for all n > N. Using the triangle inequality 
(seeExercise 1.1), weobtain |s n | = |s„-s+s| < |s n -s| + |s| < l+|s|forn> N 
and the statement is proved with B = max{|si|, .s 2 |, . . . , |sjv-i|, |s| + 1}. □ 

For the boundedness of a sequence it is not necessary that it converge. For 
example, the sequence 

(1.8) 04 = {1, O, 1. O, 1, O, 1, O,...} 

is bounded (with B = 1) but does not converge. 

The sequence (1.2) is neither bounded nor does it converge. The general 
arithmetic progression 

(1.9) 04 = {d, 2 d, 3 d, 4 d, 5 d, . . .} 

is also unbounded (for d ^ 0). For d > 0 this sequence satisfies 

(1.10) VM > 0 3 AI > 1 Vn > AI s n > M. 

To see this, take an integer N satisfying AI > M/d. If (1.10) is verified, we say 
that the sequence 04 tends to infinity and we write 

lim s n = oo or s n — * oo. 

In a similar way, one can define lim^oc s n = —oo. We next investigate the 
convergence of sequence (1.3). 

(1.4) Lemma. For the geometric progression (1.3), we have 

\ 0 for \q\ < 1, 
lim q n = < 1 for q= 1, 

^ oo for q > 1. 

The sequence (1.3) divergesfor q < — 1. 

Proof Let us start with the case q > 1. We write q = 1 + r (with r > 0) and apply 
Theorem 1.2.1 to obtain 

9 » =; (l +rr= l + nr+ 5<r4)^ + . ..>l + „r. 

Therefore, the terms q n tend to infinity (for a given M choose N > M/r in 

(1.10) ). The statement is trivial for q = 1. 

For |g| < 1 we consider the sequence s n = (l/\q\) n , which tends to infinity 
by the above considerations. For a given e > 0 we put M = 1/e and apply (1.10) 
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to the sequence {s„}. This pro ves the existence of an integer N such that for all 
n> N we have s n > M or equivalently \q n \ < s. This pro ves that q n —> 0. For 
q = — 1 the sequence oscillates between — 1 and 1 and for q < — 1 it is unbounded 
and oscillating. □ 

The following theorem simplifies the computation of limits. 

(1.5) Theorem. Consider two convergent sequences s n — » s and v n — > v. Then, 
the sum, the product, and the quotient of the two sequences, taken term by term, 
converge as well, and we have 

(1.11) + v n ) = s + v 

(1.12) lim (s„ • v n ) = .s • v 

(1.13) lim ) = - if «n 7^0 and v 0. 


Proof. We begin with the proof of (1.11). We estimate 

|(s„ + Vn) ~ (S + u)| = |s„ - S + V n - u| < |s„ - s| + \v n - v\ < 2e = s' 

< £ <£ 

by the triangle inequality. For the proof to be logical this sequence of formulas 
has to be read from back to front: given e' > 0 arbitrarily small, we choose s > 0 
such that 2e = e' . By hypothesis, the two sequences {.s n } and {v n ) converge to s 
and v. This means that there exist Ni and N2 such that |s n — .s < e for n > iVi 
and \v n — v\ < e for n > N 2 . If we choose N = max(N 1 , N 2 ), we see that (1.4) 
is satisfied for the sequence {s„ + v n }. Once we are accustomed to this argument, 
repeating these explanations will not be necessary. 

For the proof of ( 1 . 1 2) we have to estimate s n v n — sv. Let us add and subtract 
“mixed products” —sv n + sv n such that 


\s n V n — 


\s n V n — SV n + sv n — SU I 

W-K- S | + W-K-«|<(B + [s])e= £ '. 


Here, we have used Theorem 1.3 for the sequence {v n }. 

It is sufficient to prove ( 1 . 1 3) for the special case where s n = 1 for all n, and 
hence s = 1. The general result will then follow from (1.12) because s n /v n is the 
product of ( 1 /v n ) and s n . We first observe that the values of | v n cannot become 
arbitrarily small. Indeed, if we put e = |u|/2 in the definition of convergence, we 
obtain v n — v\ < \v\/2 (and hence also \v n \ > \v\/2) for sufficiently large n. 
With this estimate, we now obtain 


I J_ _ I| = < 2K-u| 3 $_ , 

\v n v\ w H - |t,p - Kf 


□ 
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(1.6) Theorem. Assume that a sequence {,s n } converges to s and that s n < B for 
all sufficiently large n. Then, the limit also satisfies s < B. 

Proof. We shall show that s > B leads to a contradiction. For this we put e = 
s — B > 0 and use (1.4). This implies that for sufficiently large n we have 

S-8 n <\8 n -s\<k*8-B, 

so that s n > B, which is in contradiction to our assumption. □ 

Remark. The analogous result for strict inequalities (.sy, < B for all n implies 
s < B) is wrong. This is seen by the counterexample s n = n/(n + 1) < 1 with 
lim^oo s n = 1. 

Cauchy Sequences. Let us now tackle an important problem. The definition of 
convergence (1.4) forces us to estimate |s„ — s|; the limit s has to be known. But 
what can we do if the limit s is unknown, or, as in Example (1.6), is not known to 
arbitrary precision? It is then impossible to estimate with rigor |s — .sy, < £ for 
any e > 0. To bypass this obstacle, Cauchy had the idea of replacing ,s„ — s| < e 
in (1.4) by |s„ — s n+ k \ < efor all the successors s n+ k of s n . 

(1.7) Definition. A sequence {.s n } is a Cauchy sequence if 


V £ > 0 3N >1 Vn>N Vfc > 1 \s n -s n+k \<e. 


K 2 % 


FIGURE 1.2. Sequence (1.6) as 




i N 

a Cauchy sequence 


Example. Fig. 1.2 illustrates condition (1.14) for the sequence (1.6). We see that, 
e.g., for£ = 0.11 condition (1.14) is satisfied for n >17. Similarly, it is also seen 
that (1.14) is true forany e > 0, because l/(n + 2) + l/(n + 3) tends to zero. 

(1.8) Theorem (Cauchy 1821). A sequence {,s n } ofreal numbers is convergent 
(with a real number as limit) ifand only ifit is a Cauchy sequence. 




III. 1 Infinite Sequences and Real Numbers 177 


It is an immediate consequence of |s n — s n+ k < |s n — s\ + |s — s„+fc| < 
2e that convergent sequences must be Cauchy sequences. A rigorous proof of 
the converse implication, beyond Cauchy’s intuition, is only possible after having 
understood the concept of irrational and real numbers. In contrast to the results 
obtained until now (Theorems 1.3, 1.5, and 1.6), Theorem 1.8 is not true in the 
setting of rational numbers. Consider, for example, the sequence 

(1.15) {1, 1.4, 1.41, 1.414, 1.4142, 1.41421,...}. 

It is indeed a Cauchy sequence (we have |s n — s n+ k \ < 10“” +1 ), but its limit y/2 
is not rational. 


Construction of Real Numbers 

The more I meditate on the principles of the theory of functions — and I 
do this unremittingly — the stronger becomes my conviction that the foun- 
dations upon which these must be built are the truths of Algebra . . . 

(Weierstrass 1875, Werke, vol. 2, p. 235) 
Piease forget everything you have learned in school; for you haven’t learned 
it. . . . My daughters have been studying (chemistry) for several semesters 
already, think they have learned differential and integral calculus in school, 
and even today don’t know why x ■ y = y • x is true. 

(Landau 1930, Engl. transi. 1945) 

V3 is thus only a symbol for a number which has yet to be found, but is not 
its definition. This definition is, however, satisfactorily given by my method 

(1.7,1.73,1.732,....) 


(G. Cantor 1889) 

... the definition of irrational numbers, on which geometric representa- 
tions have often had a confusing influence. ... I take in my definition a 
purely formal point of view, calling some given symbols numbers, so that 
the existence of these numbers is beyond doubt. (Heine 1872) 

At that point, my sense of dissatisfaction was so strong that I firmly re- 
solved to start thinking until I should find a purely arithmetic and abso- 
lutely rigorous foundation of the principles of infinitesimal analysis. ... I 
achieved this goal on November 24th, 1858, . . . but I could not really de- 
cide upon a proper publication, because, firstly, the subject is not easy to 
present, and, secondly, the material is not very fruitful. 

(Dedekind 1872) 

Demeaning Analysis to a mere game with symbols . . . 

(Du Bois-Reymond, Allgemeine Funktionentheorie, Tiibingen 1882) 

For many decades nobody knew how irrational numbers should be put into a rig- 
orous mathematical setting, how to grasp correctly what should be the “ultimate 
term” of a Cauchy sequence such as (1.15). This “Gordian knot” was finally re- 
solved independently by Cantor (1872), Heine (1872), Méray (1872) (and simi- 
larly by Dedekind 1872) by the following audacious idea: the whole Cauchy se- 
quence is declared “to be” the real number in question (see quotations). This 
means that we associate to a Cauchy sequence of rational numbers s n (henceforth 
called a rational Cauchy sequence ) a real number. 
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This seems to resolve Theorem 1.8 in an elegant manner. But there remains 
much to do: we shall have to identify different rational Cauchy sequences that 
represent the same real number, define algebraic and order relations for these new 
objects, and finally we shall find the proof of Theorem 1.8 more complicated than 
we might have thought, because the terms s n in (1.14) may now themselves be 
real numbers, i.e., rational Cauchy sequences. All these details have been worked 
out in full detail by Landau (1930) in a famous book, where he admits himself that 
many parts are “eine langweilige Miihe”. 

Equivalence Relation. Suppose that 

y/2 is associated to {1.4 ; 1.41 ; 1.414 ; . . .} 
a/ 3 is associated to {1.7 ; 1.73 ; 1.732 ;...}, 
then a/2 • a/ 3 should be associated to the sequence of the products 
{2.38; 2.4393; 2.449048,...}. 

On the other hånd, a/ 6 is also associated to {2.4 ; 2.44 ; 2.449 ;...}. So we have 
to identify the two sequences. 

Two rational Cauchy sequences {s n } and {v n } are called equivalent, if 
lim^tx,^ — v n ) = 0, i.e., if 

(1.16) Vs > 0 3N >1 Vn > Al \s n -v n \<e. 

We then write {,s„} ~ {v n }. It is not difficult to check that (1.16) defines an 
equivalence relation on the set of all rational Cauchy sequences. This means that 
we have 

Kl ~ { S „} (reflexive) 

{Sn} ~ {Vn} {Vn} ~ {««} (symmetric) 

{s„} ~ {« n }) {^n} ~ {»«} =>■ {s„} ~ {w n } (transitive). 

Therefore, it is possible to partition the set of rational Cauchy sequences into 
equivalence classes, 

{s n } = | } | {'(;„} is a rational Cauchy sequence and \v n } ~ {,s n }|. 

Elements of equivalence classes are called representatives. 

(1.9) Definition. Real numbers are equivalence classes of rational Cauchy se- 
quences, i.e., 

R = | {.s n } | K| is a rational Cauchy sequence j. 

The set Q of rational numbers can be interpreted as a subset of M in the 
following way: if r is an element of Q (abbreviated: r G Q), then the constant 
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sequence {r, r, r, . . .} is a rational Cauchy sequence. Hence, we identify the ratio- 
nal number r with the real number {r, r, . . .}. 

Addition and Multiplication. In order to be able to work with M, we have to 
define the usual operations. Let s = {s n } and v = {v n } be two real numbers. We 
then define their sum (difference), product (quotient) by 

(1.17) s + v:={s n +i; n }> s-v:={s n - v n }. 

We have to take some care with this definition. First of all, we have to ensure that 
the sequences { s„ + v n } and {.s n ■ v n } are rational Cauchy sequences (this follows 
from \ (s n + v n ) - ( s n+k + v n+k )\ < |s„ - s n+k | + \v n - v n+k \ for the sum 
and is obtained as in the proof of Theorem 1.5 for the product). Then, we have 
to prove that (1.17) is well-defined. If we choose different representatives of the 
equivalence classes s and v, say {.s' n } and {v' n }, then the result s + v has to be 
the same. For this we have to prove that s n — s' n —> 0 and v n — v' n —> 0 imply 
( s n +v n ) — (s^ + 4) — > 0 and ( s n ■ v n ) — (s^ • vQ 0. But this is obtained 
exactly as in the proof of Theorem 1.5. 

In a next step, we have to verify the known rules of computation with 
real numbers (commutativity, associativity, distributivity). Here begins Landau’s 
“langweilige Miihe”. We omit these details and refer the reader either to Landau’s 
marvelous book or to any introductory algebra text. 

Order. Let s = {s n } and v = be two real numbers. We then define 

s<v >0 3M>1 Vm>M s m <v m — e', 

(1.18) ^ ' 

s < v s < v or s = v 

(here the number e' has to be rational in order to avoid an ambiguous definition). 
The rather complicated definition of s < v means that for sufficiently large m 
the elements s m and v m have to be well separated. It also implies that the re- 
lation is well defined. Obviously, it is not sufficient to require s m < v rn (the 
sequences {1, 1/2, 1/3, 1/4, . . .} and {0, 0, 0, . . .} both represent the real number 
0 and serve as a counterexample). 

The relation s < v of (1.18) defines an order relation. This means that 
s < s (reflexive) 

s < v, v < w => s <w (transitive) 
s < v, v < s ==> s = v (antisymmetric). 

We just indicate the proof of antisymmetry. Suppose that s <v and v < s, but s / 
v. Then, there exist positive rational numbers z\ and e' 2 such that s rn < v rn — e\ 
for m > Mi and v m < s m — e’ 2 for m > M 2 . Hence, for m > max(Mi, M 2 ), 
we have e' 2 < s m — v m < —e\ , which is a contradiction. 

(1.10) Lemma. The order < of (1.18) is total, i.e.,forany two real numbers s and 
v with s f v we have either s < v or v < s. 
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Remark, s ^ v is the negation of s = v, which is expressed by Eq. (1.16). In 
order to formulate the negation of a statement like (1.16), we recall a little bit of 
logic. Let S(x) be a statement depending on x G A (A is some set) and ->S(x) its 
negation. Then, we have 2 

\/xgAS( x) is the negation of 3æ e A -i,S(ar), 

~.x e .4 S(x) is the negation of V x € A -<S(x). 

In order to obtain the negation of a long statement we have to reverse all quantifiers 
(V <-> 3 ) and replace the final statement by its negation. Hence, s ^ v is obtained 
from (1.16) as 

(1.19) .;i£ > 0 \/N> 1 3 n>N \s n -v n \>e. 

ProofofLemma 1.10. Let s = {s n } and v = {v n } be two distinet real numbers, 
such that (1.19) holds. We then put e' = s/3. Since {.sy,} and {v n } are Cauchy 
sequences, there exists N\ such that |s„ — s„+fc| < s' for n> N\ and k > 1 and 
there exists N2 such that \v n — v n+ k\ < s! for n > N2 and k > 1. We then put 
N = max(Wi, W2) and deduce from (1.19) the existence of an integer n > N 
such that \s n —v n \ > s. There are two possibilities, 

(1.20) s n — v n > e or v n — s n > s. 


FIGURE 1.3. Illustration of the two cases in (1.20) 

For k > 1 the numbers s n+ k and v n+ k stay in the disks of radius s' = s/3 (see 
Fig. 1.3). Therefore, (1.18) is satisfied with M = N and we have s > v in the lirst 
case, whereas v > s in the second case. □ 

Absolute Value. Once we have shown that the order is total (Lemma 1.10), it is 
possible to define the absolute value of a number s as being s (for s > 0) and —s 
(for s < 0). An easy consequence of this definition is that 

(1.21) |s| = {|s ra |} for s = {s«}. 

The triangle inequality |s + u| < |s| + |u| and all its consequences are valid for 
real numbers. 

Remark. In the Definitions and Theorems 1 . 1 through 1 .7, we have not been very 
precise about the concept of “number”. To be logically correct, they should have 

2 The statement “all (V) polar bears are white” is wrong if there exists (3) at least one 
colored (nonwhite) polar bear; and vice versa. 
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been stated only for rational numbers. After having now introduced with much 
pain the concept of real numbers, we can extend these definitions to real numbers 
and check that the statements of the theorems remain valid also in the more general 
context. 

Proof of Theorem 1.8. 

. . . until now these propositions were considered axioms. 

(Méray 1869, see Dugac 1978, p. 82) 

Let {sj} be a Cauchy sequence of real numbers , such that each s t itself is an 
equivalence class of rational Cauchy sequences, i.e., s* = {si n } n>1 - The idea is 
to choose for each i a number becoming smaller and smaller (for example 1/2 i) 
and to apply the definition of a rational Cauchy sequence in order to obtain 

3Ni>l Vn>JVj VJfe > 1 I Sin — S* ti+fc I < 77“ • 

’ 2i 

We then put Vi := s^jy* and consider the rational sequence {u,} (see Fig. 1.4). 

< 1/2 



FIGURE 1.4. Convergence of a Cauchy sequence 


a) We first prove that t;,; — .s/ < l/i. By (1.21), the real number \vt — s, is 
represented by the rational Cauchy sequence { \vi — Si m \ } m >i- Since, for m > N t , 


• I = I* 


it follows from (1.18) with e' = 1/2* that \v t — s,| < l/i. 

b) We next prove that {u,} is a rational Cauchy sequence. Observing that 
| Vi — Vi + k\ does not change its value if it is considered as a rational or a real 
number, we have 


K - V i+ k\ — I Vi - Si + Si — S i+ k + S i+ k - V i+ k\ 

( 1 . 22 ) < | Vi — Sj| + | s* — Si+/j| + |sj+£ — Wj+fc| < -7 + s + ^ ^ < 2 s 

for sufficiently large i and for k > 1. The equivalence class of {v n }, denoted by 
s := {;u n }, will be our candidate for the limit of {.s.,}. It follows from (1.22) that 
\vi — s | < 3c (for large enough i) so that v. t — > s. 
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c) We finally prove that s, — ► s. From parts (a) and (b) of this proof and from 
the triangle inequality, we have 

| Si - s| < |sj - Ui| + \vi - s\ < J + 3e < 4e 

for sufficiently large i. Hence, s, s, and Theorem 1.8 is proved. □ 

Monotone Sequences and Least Upper Bound 

Our next aim is to prove rigorously the faet that a majorized monotonically in- 
creasing sequence converges to a real limit. This result has been used repeatedly 
in Chap. II, especially in Sect. II. 10. 

(1.11) Definition. Let X be a subset of EL A real number £ is called the least 
upper bound (or supremum) of X if 

i) Vx G X x<£, and 
ii) Ve > 0 z>,£-e. 

We then write £ = sup X. 

Condition (i) expressses the faet that £ is an upper bound of X, whereas 
condition (ii) means that £ — e is no longer an upper bound, so that £ is really the 
smallest of all upper bounds. Our next result investigates the existence of such a 
supremum: “This Theorem is . . .” as Bolzano wrote in 1817, “. . . of the greatest 
importance” (see Stolz 1881, p. 257). It is based on Theorem 1.8 and is not valid 
in Q (the set X = {x G Q | x 2 < 2} does not have a supremum in Q). 



FIGURE 1.5. Existence of the least upper 
bound for a monotone sequence 



Po 

Pi 


(1.12) Theorem. Let X be a subset of R that is nonempty and majorized (i.e., 
3 B \/ x G X x < B). Then, there exists a real number £ such that £ = sup X. 

Proof. On Bolzano’s tracks (but also on Euclid’s, Elements, Book X), we do the 
proof by bisection. We shall construct nested intervals [<x n , ø rl ] with lengths de- 
creasing geometrically to zero, such that a n is not an upper bound of X but p n is 
one. 
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Since X is nonempty, we can find a number ao that is not an upper bound 
(choose an element x of X and take ao to the left of x). Our second assumption 
(X is majorized) implies the existence of an upper bound. We choose one and call 
it Ad- The idea is then to consider the midpoint 7 = (ao + Ad)/ 2 (see Fig. 1.5). 
There are two possibilities: either 7 is an upper bound of X (in this case, we 
set ai := ao and fi\ := 7 ) or it is not (then, we put a.\ := 7 and fi\ := Ad)- 
Repeating this procedure, we find a sequence of intervals [a n , fi n ] with lengths 

Pn~a n = (fio - a 0 )/2". 

By construction we see that all successors of a n and fi n lie inside the interval 
[a„, Ai]- Consequently, we have the estimates 


I Ai Ai -1 


I < Ai - a„ = - 


This shows that {a n } and {Ai} are Cauchy sequences. By Theorem 1.8, they are 
convergent, and, since fi n — a n = (Ad — ao)/2 n — ► 0 , they have the same limit 
£ (Theorem 1 .5). It now follows from Theorem 1 .6 that £ is an upper bound of X 
(x < fi n implies x < 0). Furthermore, for a given e > 0, there is an a n satisfying 
a n > £ — e. Since a n is not an upper bound of X, £ — e cannot be one either. □ 


(1.13) Theorem. Consider a sequence {%} that is monotonically increasing 
(s n < s n+ 1 ) and majorized (s n < B for all n). Then, it converges to a real 
limit. 

Proofi By hypothesis, the set X = {si, s 2, s 3, . . .} is nonempty and majorized 
(see Fig. 1.5). Therefore, £ = sup X exists by Theorem 1.12. By the definition of 
sup X, the value £ — s is, for a given e > 0, not an upper bound of X. Conse- 
quently, there exists an N such that sjv >£, — £■ Since X is majorized by 0 we 
have 

£ — e < Sn < Sjv+1 < Sjv+ 2 < SjV+3 < • • • < 0 
so that £ — £ < s n < £ (and thus s n — 0 < e) for all n > N. This proves the 
convergence of {s n } to 0 □ 

(1.14) Corollary. Consider two sequences {s n } and {v n }. Suppose that {s„} is 
monotonically increasing (s n < s n +i ) and that s n < v n for all (sufficiently large) 
n. Then, we have 

{v n } converges => {.s- n } converges, 

{.sy,,} diverges ==> {v n } diverges. 


Proof. If {v n } converges, then it is bounded by Theorem 1.3. Hence, {s n } is also 
bounded and its convergence follows from Theorem 1.13. The second line is the 
logical reversion of the first one. □ 

Remark. In an analogous way, we define the lower bound of a set, we define mi- 
norized and monotonically decreasing sequences, and we use the notation 
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(1.23) £ = inf X 

for the greatest lower bound or infimum of X (i.e., x > £ for all x G X and 
V e > 0 3x G X with x <£ + £). There are theorems analogous to Theorems 
1.12 and 1.13. 


Accumulation Points 

I find it really surprising that Mr. Weierstrass and Mr. Kronecker can attract 
so many students — between 15 and 20 — to leetures that are so difficult 
and at such a high level. 

(letter of Mittag-Leffler 1875, see Dugac 1978, p. 68) 

The sequence 



does not converge, but if every other term is removed, it converges either to 0 or 
to 1. A sequence with missing terms is a “subsequence”. More precisely, 


(1.15) Definition. A sequence {.s},} is called subsequence of {.sy, } if there exists 
an inereasing mapping o : N — * N with s' n = s CT („) (inereasing means that 
a(n) < a(m) ifn < m). 

(1.16) Definition. A point s is called an accumulation point of a sequence {sy, }, 
if there exists a subsequence converging to s. 


Examples. The points 0 and 1 are accumulation points of the sequence (1.24). An 
interesting example is the sequence 

(1 1 2 1 2 3 1 2 3 4 1 2 3 4 5 1 2 3 4 5 6 1 ^ 

' l2’3’3’4’4’4’5’5’5’5’6’6’6’6’6’7’7’7’7’7’7’8’ /’ 

which admits all numbers between 0 and 1 (0 and 1 included) as accumulation 
points. To see that, for example, ln 2 is an accumulation point of (1.25), consider 
the sequence 

f 6 69 693 6931 69314 693147 ^ 

1 10 ’ 100 ’ 1000 ’ 10000 ’ 100000 ’ 1000000 ’ " /■ 

It is certainly included somewhere in (1.25) and converges to ln 2. 

The unbounded sequences {1, 2, 3,4, 5, . . .}, {— 1, -2, —3, —4, —5, . . .} and 
{1,-1, 2, —2, 3, —3, 4, —4, . . .}, don’t have accumulation points. 


(1.17) Theorem of Bolzano- Weierstrass (Weierstrass’s leeture of 1874). 

A bounded sequence { sy, } has at least one accumulation point. 

Proof. Weierstrass’s original proof used bisection, as in the proof of Theorem 1.12. 
Having this theorem at our disposal, we consider the set 
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i fim 


X £-l/n P^^£ + l/n 

£ = sup X 

FIGURE 1.6. Proof of the theorem of Bolzano-Weierstrass 


(1.26) X = {x | s n > x for infinitely many n}, 

and simply put £ = sup X, which will tum out to be an accumulation point (see 
Fig. 1 .6). This number exists because X is nonempty and majorized (the sequence 
{s„} is bounded). By definition of the supremum, only a finite number of s„ can 
satisfy s n > £ + s and there is an infinity of terms s n that are larger than £ — e (s 
is an arbitrary positive number). Hence, an infinity of terms s n lie in the interval 
[£-£,£ + 4 

We now choose arbitrarily an element of the sequence that lies in [£ — 1 , £ + 1] 
and we denote it by s' x = s CT (i). Then, we choose an element in [£ — 1/2, £ h|s 
1 /2] whose index is larger than <r(l) (this is surely possible since there must be 
infinitely many) and we denote it by s' 2 = s CT (2) • At the nth step, we choose for 
s' n = Sufn) an element of the sequence that lies in [£ — 1 /n, £ + 1 /n] and whose 
index is larger than a(n— 1). The subsequence obtained in this way converges to 
£, because |s'„ — £| < l/n. □ 

Remark. This proof did not exhibit an arbitrary accumulation point but precisely 
the largest accumulation point. We call it the “limit superior” of the sequence and 
we denote it by 

(1.27) £ = lim sup s n = sup {o: e R | s n > x for infinitely many n} 

(see also Exercise 1.12). The smallest accumulation point is denoted by 

(1.28) £ = liminf s n = infjx € M | .s rl < x for infinitely many n}. 

Example. For the sequence {|, —5, 5,-5, | , — |, — —g, . . .}, we have 

lirnsup^^ s n = 1, liminf^oo s n = 0, sup{s„} = 3/2, inf{s„} = —1/2. 

Exercises 

1 . 1 (Triangle inequality). Show, by discussing all possible combinations of signs, 
that for any two real numbers u and v we have 


(1.29) 


\u + v\< \u\ + 14 
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Then, show that for any three real numbers u. v, and w we have 
(1.290 \u-w\ < |u-v| + |t;-«;|. 

1.2 Show that the sequence {.sy,} with 

_ 2n — 1 
Sn ~ n + 3 

converges to s = 2. For a given e > 0, say for s = 10 -5 , find a number N 
such that |s„ — s| < e for n > N. 

1 .3 Show that the sequences 

1 1 1 1 1 
S "“U5 + 3^7 + 5 _ 9 + Ml + • ' • + (2n- l)(2n + 3) 

1 i 1 | i 1 

Sn ~ 1-2-3 + 2-3-4 + 3-4-5 + ‘ ‘ + n(n + l)(n + 2) 
are Cauchy sequences and find their limits. 

Hint. Decompose the rational functions into partial fractions. 

1.4 Construct sequences s n and v n with lim s n = oo and lim v n = 0 to 
illustrate each of the following possibilities. 

a) lim (s n ■ v n ) = oo; 

b) lim (s n ■ v n ) = c, where c is an arbitrary constant; and 

c) s n ■ v n is bounded but not convergent. 

1 .5 Consider the three sequences 

», = Æ+Ioi-A »„ = 0" + ^-^. “" = \/ n + TS»“ v/5 ' 

Show that s n > v n > u n for n < 10 6 and compute lim s n , lim v n , 
lim u n , if they exist. Arrange these limits in increasing order. 

1.6 Show with the help of the estimates of Exercise 1.2.5 that 



is a Cauchy sequence. Find, for £ = 10 5 , an integer N such that v n — 
v n+ k\ < e for n > N and k > 1. 

1.7 For two rational Cauchy sequences {o„} and {b n }. we denote by {a n ■ b n } 
the sequence formed by the products term by term. Show 

a) the sequence { a n ■ b n } is again a Cauchy sequence; and 

b) if {a n } ~ {.s n } and {b n } ~ \v n } as defined in (1.16), then \a n ■ b n } ~ 
{.sy, • v n } . This shows that the product of two real numbers defined in (1.17) 
is independent of the choice of the representatives. 
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1.8 Show the following: if s is the only accumulation point of a bounded se- 
quence {s n }, then the sequence is convergent and lim^oo ,s n = s. Show by 
a counterexample that this property is not true for unbounded sequences. 

1.9 (Cauchy 1821, p. 59; also called “Cesaro summation”). Let lim^oo a„ = a 

and n 

K = ~^2a k . 

71 fe= i 

Show that lim^oo b n = a. 

1.10 Let a be an irrational number (for example, a = \/2 ). Consider the sequence 
{s n } defined by 

s n = (na) mod 1, 

i.e., s n G (0, 1) is na with the integer part removed. Compute si, s 2, s 3, 
S4, . . . and sketch these values. Show that every point in [0, 1] is an accumu- 
lation point of this sequence. 

Hint. For £ > 0 and n > 1 /e at least two points among s* , 8 % , . . . , s n+ i (call 
them Sfc and s k +t) are closer than e. Then, the points s k , s k +t, s k + 22, ■ ■ ■ 
form a grid with mesh size < s. 

Remark. At the beginning of the computer era, this procedure was the stan- 
dard method for creating pseudo random numbers. 

1.11 Let {s n } and {v n } be two bounded sequences. Show that 

lim sup ( s n + v n ) < lim sup s n + lim sup v n 
liminf ( s n + v n ) > liminf s n + liminf v n . 

Show with the help of examples that the inequality can be strict. 

1.12 Prove that for a sequence {s n } we have 

limsups„ = lim v n , where v n = sup{s„, s n +i, s„+2, • • •}• 

1.13 Compute all accumulation points of the sequence 

k 1 

{«n} = {Pil, P21, P22, P31, P32, P33, Pil, Pi2, ■ ■ •}, Pkt = X] ^2 ’ 

Show that (see Eq. (1.5.23)) lim sup s„ = 7 t 2 /6 and that liminf ,s ri = 0 (see 
Fig. 1.7). 



FIGURE 1.7. Sequence with a countable number of accumulation points 



188 III. Foundations of Classical Analysis 

HI. 2 Infinite Series 


I shall devote all my efforts to bring light into the immense obscurity that 
today reigns in Analysis. It so lacks any plan or system, that one is really 
astonished that so many people devote themselves to it — and, still worse, 
it is absolutely devoid of any rigour. 

(Abel 1826, Oeuvres , vol. 2, p. 263) 
Cauchy is mad, and there is no way of being on good terms with him, 
although at present he is the only man who knows how mathematics should 
be treated. What he does is excellent, but very confused . . . 

(Abel 1826, Oeuvres, vol. 2, p. 259) 

Since Newton and Leibniz, infinite series 

( 2 . 1 ) ao + ai + 02 + 03 + • • • 

have been the universal tool for all calculations (see Chap. I). We will make precise 
here what (2.1) really represents. The idea is to consider the sequence { s n } of 
partial sums 


( 2 . 2 ) 


si — ao + 01, 


= J2 ai ' 


and to apply the definitions and results of the preceding section. A classical refer- 
ence for infinite series is the book of Knopp (1922). 

(2.1) Definition. We say that the infinite series (2.1) converges, if the sequence 
{sn} of (2.2) converges. We write 

aj = lim s n or y^ = lim s n . 
i= o n ^°° f||| n^oo 



FIGURE2.1. “Geometric” view of the geometric series 


(2.2) Example. Consider the geometric series whose nth partial sum is given by 
s n = 1 + q + q 2 + . . . + q n (see Fig. 2.1). Multiplying this expression by 1 — g, 
most terms cancel, and we get (for q 1) 


s n = 1 + q + q 2 + . . . + q n 


1 ~ Q n+1 


1 ~q 
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From Lemma 1.4, together with Theorem 1.5, we thus have 


1 +q + q 2 + q 3 +q 4 +q 5 + ■ 


1 - q 

diverges — > oo 
diverges 


if M < 1, 

if q > 1, 
if q < —1. 


Criteria for Convergence 

Usually it is not possible to find a simple expression for s n and it is difficult to 
compute explicitly the limit of {s„}. In this case, it is natural to apply Cauchy’s 
criterion of Theorem 1.8 to the sequence of partial sums. Since s n+ k — s n = 
a n + 1 + a n+ 2 + . . . + a n+k , we get 

(2.3) Lemma. The infinite series (2.1) converges to a real number ifand only if 
V £ >0 3iV>0 \/n>N Vfc>l |o„+i +a n+2 + . . . + a n+k \ < e. □ 


Putting k = 1 in this criterion, we see that 

(2.3) lim tu = 0 

is a necessary condition for the convergence of (2.1). However, (2.3) is not suffi- 
cient for the convergence of (2.1). This can be seen with the counterexample 

„ 1111111111 
1 + — + — + — + — + — + — + — + — + -4- — + ■■■ — * oo. 
2233344445 

In what follows, we shall discuss some sufficient conditions for the convergence 
of (2.1). 

Leibniz’s Criterion. Consider an infinite series where the terms have alternating 
signs 

(2.4) Oo - «1 + 02 - 03 + «4 - . . . = ]T(-1)V 


(2.4) Theorem (Leibniz 1682). Suppose that the terms o, of the alternating series 

(2.4) satisfyfor all i 

ai > O, Oj+i < o*, lim o* = 0; 
then, the series (2.4) converges to a real value s and we have the estimate 

(2.5) [« — s n \ < a n+ i, 

Le., the error of the nth partial sum is not larger than the first neglected term. 
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FIGURE2.2. Proof of Leibniz’s criterion 


Proof. Denote by s n the nth partial sum of (2.4). It then follows from the mono- 
tonicity assumption that S2k+i = S'ik - 1 + a-ik ~ «2/c+i > .S2fc-i and that 
«2fc+2 = S2k ~ (I2k+1 + 02fc+2 < s 2fc . From the positivity of o 2 fc+i, we have 
S2fc+i < S2k so that, by combining these inequalities, 

Si < S 3 < S5 < S? SQ < S4, < S2 < So 

(see Fig. 2.2). Consequently, s n+ k lies for all k between s n and s n+ i, and we have 
(2.6) |s„ +fe -s„|<| S „+i- S „|=a„ +1 . 

This implies the convergence of {s„} by Theorem 1.8, since a n+ 1 tends to 0 for 
n — *■ 00. Finally, the estimate (2.5) is obtained by considering the limit k — > 00 in 
(2.6) (use Theorem 1 .6). □ 

Examples. The convergence of (see (1.4.29) and (1.3. 13a)) 



is thus established. However, we have not yet rigorously proved that the first sum 
represents 7r/4 and the second one ln 2 (see Example 7.1 1 below). 

If a continued fraction (1.6.7) is converted into an infinite series, we obtain 
(see Eq. (1.6.16)) 

Pi P1P2 P1P2P3 P1P2P3PA 

Qo Bi BiB 2 B 2 B 3 B 3 B a 

Assuming that the integers Pi and </, are positive, this is an altemating series (from 
the second term onward). Furthermore, we have B^ = q^Bk-i + PkBk-2 > 
PkBk-2, implying that the terms of the series are monotonically decreasing. Under 
the additional assumption that 0 <Pi< q% for all % > 1 (see Theorem 1.6.4), we 
have 

BkBk-i = qfc-Bfc-i + PkBk-iBk-2 > 2 PkB k -iB k -2 
and consequently also BkBk-i > 2 k ~ 1 pkPk~i ■ . . . -pi. This proves that the terms 
of the series tend to zero and, by Theorem 2.4, that the series under consideration 
converges. 
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Majorizing or Minorizing a Series. For infinite series with non-negative terms 
the following criterion is extremely useful. 

(2.5) Theorem. Suppose that 0 < a, < bi for all (sufficiently large) i. Then 
b‘i converges => SSo a '‘ converges, 

TZo a i diverges =>• XSo h diverges. 

Proof. Putting ,s n = fo and v n = X^=o this result is an immediate 

consequence of Corollary 1.14. □ 


As a first application, we give an easy proof of the divergence of the harmonic 
series X^i>i j (N. Oresme, around 1350; see Struik 1969, p. 320). We minorize 
this series as follows: 


bi — 1 + j + ^ + j + + + ^ + g 

X} fo = l + 5+ j + 2+ § + § + ^ + ^ 



Since X} a i diverges, it follows from 0 < a, < b, that the harmonic series X] bi 
diverges too. 

As a further example, we consider the series (1.2. 1 8) for e x (e.g., for x = 10), 


(2.7) 


>+ w *fr« 


10 3 

"3T 


10 4 

^r 


10 5 

^r 


We omit the first 10 terms (this does not influence the convergence), and compare 
the resulting series with the geometric series (Example 2.2 with q = 10/11 < 1) 

io 10 io 11 io 12 _ 10 lc 7 i 10 10 ' 10 10 - 10 -10 \ 

ToT + TIT + l2f + • • • “ loT V 1 + Il + 11-12 + 11-12 -13 + ■ ' 7 
10 10 / 10 10 2 10 3 \ 

~ ToT ( 1+ n + ii 2 + Ti3 + ‘’7’ 


The convergence of the geometric series implies the convergence of (2.7). Simi- 
larly, one can prove that the series (1.2.18) converges for all x. This comparison 
with the geometric series will be used on several occasions (see Criteria 2.10 and 
2.11, Lemma 7.1, and Theorems 7.5 and 7.7). 


(2.6) Lemma. The series 



converges for all a > 1. It diverges for a < 1. 


Proof. The divergence of the series for a = 1 (harmonic series) has been estab- 
lished above. For a < 1 the individual terms become still larger, so that the series 
diverges by Theorem 2.5. 
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We shal I next prove the convergence of (2.8) for a = (k + l)/k, where k > 1 
is an integer. The idea is to consider the series 


^3 ^4 ^5 ■" frf ’ W 


which converges by Leibniz’s criterion. The sum of two successive terms can be 
minorized as follows: 


(2.9) 


1 i fy2i - yw^r ^ 1 

yw^r y^i yw=i-y¥i~ k ' )/ fc! 


where Ck = l/(fc ■ 2 ( - k+1 ^ k ) is a constant independent of i. The last inequality in 
(2.9) is obtained from the identity a k — b k = (o — b)(a k ~ 1 + a k ~ 2 b + a k ~ 3 b 2 + 
. . . + b k ~ x ) with a = v2i and b = y 2i — 1 as follows: 




(2 i)(k-l)/k q 


+- (2 i - l)(fe-i)/fe ^ k ■ (2i)( fc-1 )/ fe ' 


Thus, by Theorem 2.5, the series (2.8) converges for a = (k + 1) /k. 

Finally, for an arbitrary a > 1 there exists an integer k with a > (k + l)/k. 
Theorem 2.5 applied once more then shows convergence for all a > 1. □ 


Absolute Convergence 


Example. The series 
( 2 . 10 ) 


11111 
1 ^2 + 3 _ 4 + 5 _ 6 + --- 


is convergent by Leibniz’s criterion (actually to ln 2). If we rearrange the series as 
follows: 


1/2 


11111 

1/10 


1/6 


14 16 

1/14 


1111 1 _ 1 / 1 1 1 1 
2~4 + 6~8 + 10 - '" - 2V 1_ 2 + 3 - 4 + 5 - "V 


which is now half as much as originally. This shows that the value of an infinite 
sum can depend on the order of summation. 


(2.7) Definition. A series o K * s a rearrangement ø/XSfl a *> tf ever y terfn 
of a i appears in X^o a 'i exact b' once and conversely ( this means that 
there exists a bijective mapping a : No — » No such that a/ t = o CT (j); here 
N 0 = {0,1, 2, 3, 4,...}). 
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Explanation. An elegant explanation for the above phenomenon was given by 
Riemann(1854, Werke, p. 235, . . einUmstand, welcher von den Mathematikern 

des vorigen Jahrhunderts ubersehen wurde . . In faet, Riemann observed much 
more: for any given real number A it is possible to rearrange the terms of (2.10) 
in such a way that the resulting series converges to A. The reason is that the sum 
of the positive terms of (2.10) and the sum of the negative terms, 

1111 , 11111 
1+ 3 + 5 + 7 + 9 + -'- and _ 2"4 _ 6 _ 8 _ i0"-" ’ 

are both divergent (or equivalently: the series (2.10) with each term replaced by 
its absolute value diverges). 

The idea is to take first the positive terms 1 + 1/3 + . . . until the sum exceeds 
A (this certainly happens because the series with positive terms diverges). Then, 
we take the negative terms until we are below A (this certainly happens because 
— 1/2— 1/4— ... diverges). Then, we go on adding positive terms until A is again 
exceeded, and so on. In this way, we obtain a rearranged series that converges to 
A (cf. examples in Fig. 2.3). 


1.3 


1.1 

1.0 

.9 

.8 


1.3 
1.2 
1.1 
1.0 

FIGURE2.3. Rearrangements of the series (2.10) 



(2.8) Definition. The series (2.1) is absolutely convergent if 

|«o| + |<tl| + 1 02 | + letøl + • • • 

converges. 

(2.9) Theorem (Dirichlet 1837b). If the series a ' 1 ;,v absolutely convergent, 
then all its rearrangements converge to the same limit. 

Proof. By Cauchy’s criterion, absolute convergence means that 

Ve > 0 3iV > 0 V n > N V A; > 1 |u n +i | + \a n +2 ! + ••• + |un+fc| < £•■ 
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For a given e > 0 and the corresponding N > 0 we choose an integer M in 
such a way that all terms oq, a%,. . . , um appear in the Mth partial sum s' M = 
a 'i °f ^e rearrangement. Therefore, in the difference s m — s' m , all the terms 
ao, oi, . . . , ojv disappear (for m > M) and we have 

\ s m ~ s'm\ — l a iV+l| + |ciJV+2| + • • • + \a>N+k\ < £, 

where k is a sufficiently large integer. This shows that s rn — s' m — > 0 and that the 
rearrangement converges to the same limit as the original series. □ 


We next present two criteria for the absolute convergence of an infinite se- 
ries. 

(2.10) The Ratio Test (Cauchy 1821). Ifthe terms a n of the series (2.1) satisfy 

(2.11) limsup < l ; 


then the series is absolutely convergent. If lim inf n _ >00 a„+i \/\a n \ > 1, then it 
diverges. 

Proof. Choose a number q that satisfies lim sup.^^ \a n +i\/\a n \ < q < 1. Then, 
only a finite number of quotients | a n+ 1 1 / 1 a n \ are larger than q and we have 


3N >0 \/n> N 


l«n+l| < 


This, in tum, implies |ajv+i| < g|ajv|, |ajv+2| < <z 2 |a.jv|, |oiv+3| < g 3 |ajv|, etc. 
Since the geometric series converges (we have 0 < q < 1), the series \a t \ 
also converges. 

If liminfn^oo |a„ + i|/|a„| > 1, then the sequence {|a n |} is monotonically 
increasing for n > N and the necessary condition (2.3) is not satisfied. □ 


Examples. The general term of the series for e x is a n = x n /n\. Here, we have 
|a n+ i|/|a n | = \x\/(n + 1) 0 so that the series (1.2.18) converges absolutely 

for all real x. Similarly, the series for sin x and cos x converge absolutely for all x. 

For the series (2.8) this criterion cannot be applied because |a„ + i|/|a„| = 
(n/(n+ 1)) Q -> 1. 

(2.11) The Root Test (Cauchy 1821). If 

(2.12) limsup y/\a n \ < 1, 


then the series (2.1) is absolutely convergent. If limsup^^ \J \ a n > 1, then it 
diverges. 

Proof. As in the proof of the ratio test, we choose a number q < 1 that is strictly 
larger than lim sup^^^ \J \ a„ \ . Hence, 
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3N >0 \/n>N <q. 

This implies \a n \ < q" for n > N, and a comparison with the geometric series 
yields the absolute convergence of a "> - l' 111 su Pn^oo V\ a n\ > 1> then the 

condition (2.3) is not satisfied and the series cannot converge. □ 


Double Series 

Consider a two-dimensional array of real numbers 

aoo + a oi + ao2 + «03 + • • • 


so 

+ 

Si 


Vo + Vi + V2 + V3 +... = ??? 

and suppose we want to sum up all of them. There are many natural ways of doing 
this. One can either add up the elements of the ith row, denote the result by s*, 
and then compute X)}=o s *> or one can a dd U P the elements of the jth column, 
denote the result by vj, and then compute YYjLo v :i ■ It is also possible to write all 
elements in a linear arrangement. For example, we can start with aoo, then add the 
elements a l3 for which i+j = 1, then those with i+j = 2, and so on. This gives 
(2.14) aoo + (°io + °oi) + (°20 + an + 002) + (030 + ...) + ... . 

Here, we denote the pairs (0,0), (1,0), (0,1), (2,0),... by cr(0), cr(l), cr(2), 
cr(3), . . . , so that er is a map a : No — > No x No, where No x No = {(i,y) | i 6 
No, jr e No} is the so-called Cartesian product of No with No. So, we define in 
general, 


(2.12) Definition. A series Yk = o is called a linear arrangement of the double 
series (2.13) if there exists a bijective mapping a : No — » No x No such that 
h = a<r(fc). 


The question now is: do the different possibilities of summation lead to the 
same value? Do we have 

(2.15) s 0 + si + . . . = = a *l) = v o + vi + . . . , 

*= 0 j =0 j = 0 i= 0 

and do linear arrangements converge to the same value? 

The counterexample of Fig. 2.4a shows that this is not true without some 
additional assumptions. 
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l-l+0 + 0+...= 0 
+ + + + + 
0+l-l+0+...= o 
+ + + + + 

0 + 0 + I — I + . . . = o 
+ + + + + 

0 + 0 + 0+ l-...= o 
+ + + + + 

1 + 0 + 0 + 0 + ... =1 fO 


M 



FIGURE 2.4a. Counterexample 


FIGURE 2.4b. Double series 


(2.13) Theorem (Cauchy 1821, “Note VII”). Supposefor the double series (2.13) 
that 

(2.16) |5>0 Vm>0 \ a ri\ — B- 

i= o j=0 

Then, all the series in (2.15) are convergent and the identities of (2.15) are satis- 
fied. Furthermore, every linear arrangement of the double series converges to the 
same value. 

Proof. Let6o+6 i +62 + . . . be a linear arrangement of the double series (2.13). The 
sequence |&*|} is monotonically increasing and bounded (by assumption 

(2.16) ) so that YZo l^*l> anc i hence also &*> converge. Analogously, we can 
establish the convergence of s* = Y^jLo a ri an( l v j = a ri • 

Inspired by the proof of Theorem 2.9, we apply Cauchy’s criterion to the 
series |&*| and have 

Ve>0 3N>Q \/n>N Vfc > 1 \b n+1 \ + \b n+2 \ + ■ ■ ■ + \b n+k \ < e. 

For a given e > 0 and the corresponding N > 0 we choose an integer M in 
such a way that all elements bo,bi, . . . ,bjy are present in the box 0 < i < M, 
0 < j < M (see Fig. 2.4b). With this choice, bo, lp, ... . b/y appear in the sum 
E' =0 h (for l > N) as well as in 0 J2j=o a ri (f° r tn > M and n > M ). 
Hence, we have for l > N, m > M, n > M, 

i m n l i 

(2.17) XX a 'i-E^ < \b N +i\ + ... + \b»+ k \<£, 

I j= 0 j= 0 t= o I 

with a sufficiently large k. We set s = and take the limits l —> 00 and 

n — > 00 in (2.17). Then, we exchange the finite summations ]C"=o ^j=o 



III.2 Infinite Series 197 


J 2 j=o o ar| d ta ke the limits l — > oo and m — > oo. This yields, by Theorem 

1 . 6 , 

5 >-- <» an d — s < g. 

lj=o I 1 7=0 I 

Hence Si an d SJlo v j both converge to the same limit s. □ 


The Cauchy Product ofTwo Series 

If we want to compute the product of two infinite series Y/oLo a -i ar >d J/yLo bj, 
we have to add all elements of the two-dimensional array 



do bo 

dobi 

do&2 

0063 


mbo 

dibi 

ai&2 

6(163 

(2.18) 

d 2 bo 

0261 

6(262 

0263 


a^bo 

0361 

6(362 

6(363 


If we arrange the elements as indicated in Eq. (2.14), we obtain the so-called 
Cauchy product of the two series. 


(2.14) Definition. The Cauchy product of the series Y/hLo a * an< ^ Y/jLo bj is de- 
fined by 

'y ' I y ' dn—j • bj\ = aobo + (aobi + 6(160) + (6(062 + flibi + 0260) + • • • • 

n= 0 k j- o ' 


The question is whether the Cauchy product is a convergent series and 
whether it really represents the product of the two series a ' ar| d J 2 j>o bj. 

(2.15) Counterexample (Cauchy 1821). The series 

J_ _L l_ J__ 

V / 2 + V / 3 v / 4 + ^5 

converges by Leibniz’s criterion. We consider the Cauchy product of this series 
with itself. Since 


£ 



= £ 


s/n + ! - j ■ 


2 n + 2 
n + 2 


(the inequality is a consequence of (n+1— x)(x+l) < (l+n/2) 2 forO < x < n), 
the necessary condition (2.3) for the convergence of the Cauchy product is not 
satisfied (see Fig. 2.5). This example illustrates the faet that the Cauchy product 
of two convergent series need not converge. 
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FIGURE2.5. Divergence of the Cauchy product of Counterexample 2.15 


(2.16) Theorem (Cauchy 1821). Ifthe two series a > anc ^ % are a ^~ 
solutely convergent, then its Cauchy product converges and we have 

(2-19) = 

G=0 ; V j = 0 / n = 0 V j=0 ' 

Proof By hypothesis, we have YmLq \ a i\ — Bi and Y^JLo \ty\ < B 2 . Therefore, 
we have for the two-dimensional array (2.18) that for all m > 0 

EE l°*IIM tf BiB 2 , 

i= 0 j - 0 

and Theorem 2. 13 can be applied. The sum of the zth row gives s t = a t ■ ty 
and Y^ilo 3 ‘i = o a i)(J2^Loty)- % Theorem 2.13, the Cauchy product, 
which is a linear arrangement of (2.18), also converges to this value. □ 


Examples. For \q\ < 1 consider the two series 

1 +q + q 2 +q 3 + ■■■ = — — and 1 - q + q 2 - q 3 + 

1 -<l 

Their Cauchy product is 

1 +q 2 +q 4 +q 6 + ...= 

1 - q 2 

which, indeed, is the product of (1 — g) -1 and (1 + g) -1 . 

The Cauchy product of the absolutely convergent series 


gives the series for e x+y (use the binomial identity of Theorem 1.2. 1). 
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Remark. The statement of Theorem 2.16 remains true if only one of the two se- 
ries is absolutely convergent and the second is convergent (F. Mertens 1875, see 
Exercise 2.3). 

Under the assumption that the series a,; , bj and also their Cauchy 
product (Definition 2.14) converge, the identity (2.19) holds (Abel 1826, see Ex- 
ercise 7.9). 

Exchange of Infinite Series and Limits 

At several places in Chap. I, we were confronted with the problem of exchang- 
ing an infinite series with a limit (for example, for the derivation of the series 
for e x in Sect. 1.2 and of those for sin x and cos x in Sect. 1.4). We considered 
series d n = Sn :i depending on an integer parameter n, and used the faet 

that lim^oo d n = YffLo lim^oo s nj . Already in Sect. 1.2 (after Eq. (1.2.17)), it 
was observed that this is not always true and that some caution is necessary. The 
following theorem States sufficient conditions for the validity of such an exchange. 

(2.17) Theorem. Suppose that the elements of the sequence {soj, sy, S2j, ■ ■ •} all 
have the same sign and that |s„ + ij| > s n] for all n and j. If there exists a 
bound B such that J2j=o \ s nj\ < B for all n > 0 , then 

(2.20) lim ^ s n j 



Proof. The idea is to reformulate the hypotheses in such a way that Theorem 2.13 
is direetly applicable. At the beginning of this section, we saw that every series 
can be converted to an infinite sequence by considering the partial sums (2.2). 
Conversely, if the partial sums so> suSStt • • • are given, we can uniquely define 
elements a, such that o a * = s n- We just have to set ao = s o and a t = 
Si — Si - 1 for* > 1. 

Applying this idea to the sequence {.soj, $ij, s^j. ■ ■ ■}, we define 

0()j * — S (jj . 0,ij '■ = Sij -S'j—1 , SO that ^ ^ (Ijj = S rl ,j . 

i = 0 

Replacing s n j by this expression, (2.20) becomes 

(2.2 i) jr jr a tJ = Jim^ ^ a tJ . 

j= 0 i=0 j= 0 i=0 

Exchanging the summations in the expression on the left side of (2.21) (this is 
permitted by Theorem 1.5), we see that (2.21) is equivalent to (2.15). Therefore, 
we only have to verify condition (2.16). The assumptions on {soj, s ij-, ■ ■ •} imply 
that the elements aoj, aij, ... all have the same sign. Hence, we have 
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\ a ij\ — l s nj| and r yy |ajj| — |s„j| < B. 

i=0 i=0 j= o j=0 

By Theorem 2.13, this implies (2.21) and thus also (2.20). □ 


(2.18) Example. We will give here a rigorous proof of Theorem 1.2.3. From the 
binomial theorem, we have 


(i+5) - 


l + y + 


- 2 (i-l) t/ 3 (i-l)(i- 


1-2-3 


which is a series depending on the parameter n. We set 


y 2 ( i-k) 


y 3 ( 1-I)(1-|) 


For a fixed y the elements of the sequence {scy, sy, • • •} all have the same sign, 
and { | .Sqj | , |.Sij|, . . .} is monotonically increasing. Furthermore, we have 




because, by the ratio test, |yP/i- is a convergent series. Hence, Theo- 

rem 2.17 yields 


lim 


£ 

3! 


£ 

4! 


Exercises 

2. 1 Compute the Cauchy product of the two series 


/(*) = 


and g(y) = 1 - — + — - 


and find the series for f(x)g(y) + g(x)f(y). Justify the computations. Does 
the result seem familiar? 

2.2 Show that the Cauchy product of the two divergent series 


2 + 2 + 2 2 + 2 3 + 2 4 - 


-)(- 




converges absolutely. 
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2.3 


(Mertens 1875). Suppose that the 
series Y^iLo a i ' s convergent and 
that ^°L 0 fy is absolutely conver- 
gent. Prove that the Cauchy prod- 
uct of Definition 2. 14 is convergent 
and that (2.19) holds. 

Hint. Put c n = a n-jbj and 

apply the triangle inequality (but 
only to the first sums) in the iden- 
tity 

i>- (x>)(i>)=x: 

i = 0 i =0 |=0 j = 0 





2.4 Determine the constants oi, a-2, a 3, 04, . . . so that the Cauchy product of the 
two series 

^1 — dl + 02 — d3 + . . .^ ^1 — dl + d2 — 03 + • • = ^1 — 1 + 1 ~ 1 + • • 

becomes the divergent series 1 — 1 + 1 — ... . Show that the series 1 — di 4- 
a-2 — 0,3 + . . . converges (Fig. 2.6). Can it converge absolutely? 

Hint. The use of the generating function for the numbers 1, —di, d2, — 03, . . . 
reduces this exercise to a known formula of Chap. I and to Wallis’s product. 


B-ir«, 


ii 1 ! s 

*\ i \ • ■ \ •' \ X* - ' 1 • ” X ‘ ' 

;* M / W \ ■' \ / \ n i=0 


FIGURE2.6. Divergence of the Cauchy product of Exercise 2.4 


2.5 Justify Eq. (1.5.26) by taking the logarithm and applying the ideas of Exam- 
ple 2.18. 
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III. 3 Real Functions and Continuity 

We call here Function of a variable magnitude, a quantity that is composed 
in any possible manner of this variable magnitude & of constants. 

(Joh. Bernoulli 1718, Opera, vol. 2, p. 241) 
Consequently, if /(- + c) denotes an arbitrary function . . . 

(Euler 1734, Opera, vol. XXII, p. 59) 
If now to any x there corresponds a unique, hnite y, . . . then y is called a 
function of x for this interval.. . . This definition does not require a com- 
mon rule for the different parts of the curve; one can imagine the curve as 
being composed of the most heterogeneous components or as being drawn 
without following any law. (Dirichlet 1837) 

Real functions y = f{x) of a real variable x were, since Descartes, the universal 
tool for the study of geometric curves and, since Galilei and Newton, for mechan- 
ical and astronomical calculations. The word “functio” was proposed by Leibniz 
and Joh. Bernoulli, the symbol y = f(x) was introduced by Euler (1734) (see quo- 
tations). In the Leibniz-Bernoulli-Euler era, real functions were mainly thought of 
as being composed of elementary functions (“expressio analytica quomodocunque 
. . . . Sic a + 3z, az — 4 z 2 , az + b\fa 2 — z 2 , c z etc. sunt functiones ipsius z”, Euler 
1748), perhaps with different formulas for different domains (“curvas discontin- 
uas seu mixtas et irregulares appellamus”). The 19th century, mainly under the in- 
fluence of Fourier’s heat equation and Dirichlet’s study of Fourier series, brought 
a wider notion: “any sketched curve” or “any values y defined in dependence of 
the values x” (see the quotation above). 


(3.1) Definition (Dirichlet 1837). A function f : A —> B consists oftwo sets, the 
domain A and the range B, and of a rule that assigns to each x G A a unique 
element y G B. This correspondence is denoted by 

V = f(x) or x i ► f(x). 

We say that y is the image of x and that x is an inverse image of y. 


Throughout this section, the range will be M (or an interval) and the domain 
will be an interval or a union of intervals of the form 

(o, b) = {x G R | a < x < b} or [a, b] = {x G K | a < x < b} or 

(a, b] = {x G R | a < x < b} or [a, oo) = {x G M | a < x < oo} or .... 


The interval (a, b) is called open, while [a, b ] is closed. 

As in the following examples, we usually use braces for functions that are 
defined by different expressions on different parts of A. 


Examples. 1. The function / : [0, 1] — > M, 


(3.1) 


/(*) = 


x 

1 - X 


0 < x < 1/2 
1/2 < x < 1, 


is plotted on the right. We observe that some 
y G R have no inverse image, and that some 
have more than one. 
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2. Our second function can be defined either 
by a single expression, as a limit, 
braces by separating three cases: 

f(x) = lim arctan (nx) 

( 7t/2 x > 0 


(3.2) 


= 0 


h 



\ 1 2 

n = 1, 2, 4, 8, 16, . . 


— tt/ 2 æ < 0. 


3. The following function, which is difficult 
to plot, is due to Dirichlet (see Werke, vol. 2, 
p. 132, 1829, “On aurait un exemple d’une 
fonetion . . 


(3.3) 


/w = {" 


x irrational 
x rational. 


4. This function is of a similar nature to Dirich- 
let’s, but the peaks become lower for inereas- 
ing denominators of x: 

f 0 x irrational 

/(*) = 


(3.4) 


l/q x = p/q simpl. fraction. ** fa A 


5. When x tends to zero, 1/x tends to oo, 
therefore 


(3.5) f(x) ■ 


^ _ £ sin(l/a;) x ^ 0 
x = 0 

will produce an infinity of oscillations in the 
neighborhood of the origin (Cauchy 1821). 

6. Here the oscillations close to the origin are 
less violent, due to the factor x, but there are 
still infinitely many (Weierstrass 1874): 


x^O 

x = 0. 


(3.6) f{x) = \ x -Ml/x) 

lo 


7. Our last example was proposed, accord- 
ing to Weierstrass (1872), by Riemann (see 
Sect. III. 9 below) and is defined via an infinite 
convergent sum: 


(3.7) f(x) = ]T 


sin(n 2 æ) 
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Continuous Functions 

. . . f(x) will be called a continuous function, if . . . the numerical values 
of the difference 

fix + a) - fix) 

decrease indefinitely with those of a . . . 

(Cauchy 1821, Cours d’ Analyse, p. 43) 
Here we call a quantity y a continuous function of x, if after choosing a 
quantity e the existence of <5 can be proved, such that for any value between 
xo — S ... xo + S the corresponding value of y lies between yo — e . . . yo + e. 

(Weierstrass 1874) 

Cauchy (1821) introduced the concept of continuous functions by requiring that 
indefinite small changes of x should produce indefinite small changes of y (see 
quotation). Bolzano (1817) and Weierstrass (1874) were more precise (second 
quotation): the difference f fx) — f fx o) must be arbitrarily small, if the difference 
x — xq is sufficiently small. 


(3.2) Definition. Let Abe a subset ofR and xo e A. The function f : A —> Ris 
continuous at xq iffor every e > 0 there exists a S > 0 such that for all x G A 
satisfying \x — xq\ < 8 we have \f(x) — f(x o)| < e, or in symbols: 


Ve>0 3£>0 Vx € A : | æ — ar 0 1 < <5 | f(x) — f(x o) | < s. 


The function f(x') is called continuous, ifit is continuous at all xq € A. 


See Fig. 3.1a for a continuous function and Figs. 3. Ib— 3. If for functions with 
discontinuities. 



FIGURE3.1. Continuous and discontinuous functions 
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Discussion of Examples (3.1) to (3.7). The function (3.1) is continuous every- 
where, even at xo = 1/2; the function (3.2) is discontinuous at 0; (3.3) is dis- 
continuous everywhere; (3.4) is continuous for irrational xq and discontinuous for 
rational xo (Exercise 3.1); (3.5) is discontinuous at xq = 0; (3.6) is continuous 
everywhere, even at x = 0; (3.7), which appears to exhibit violent variations, is 
nevertheless everywhere continuous (as we shall see later in Theorem 4.2). 

(3.3) Theorem. A function f : A — > M is continuous at xq € A if and only iffor 
every sequence {x n } n >i with x n G A we have 

(3.8) ^lini f(x n ) = f(xo) if \im°x n = x 0 - 

Proof. For a given s > 0, choose S > 0 as in Definition 3.2. Since x n — > xq , there 
exists N such that \x n — x$\ < S for n > N. By continuity at xo, we then have 
\f(x n ) - f(x o) | < s for n > N and (3.8) holds. 

Suppose now that (3.8) holds, but that f(x) is discontinuous at xq. The nega- 
tion of continuity at Xq is 

■:3e>0 V <5 > 0 3x€A : \x — xo| < <5 \f(x) — f(xo)\>e. 

The idea is to take 6 = l/n and to attach an index n to x (which depends on S). 
This gives us a sequence {x n } with elements in A such that \x n — xo\ < l/n 
(hence x n — > xq) and at the same time \f(x n ) - f(x o)| > e. This contradicts 

(3.8) . □ 

(3.4) Theorem. Let f : A —> E and g : A — > Rbe continuous atxo&A and let 
Xbe a real number. Then, the functions 

f + g, A •/, f-g, f/g {if g(x o) ^ 0) 
are also continuous at xo- 

Proof We take a sequence {x n } with elements in A and converging to xq . The 
continuity of / and g implies that f(x n ) — > f(x o) and g{x n ) —> g(x o) for n — > 
oo. Theorem 1.5 then shows that 

f{x n )+g{x n ) -> f(x 0 ) + g(x 0 ), 

so that f + g is seen to be continuous at xo (Theorem 3.3). 

The continuity of the other functions can be deduced in the same way. □ 

Example. It is obvious that the constant function f(x) = a is continuous. The 
function f{x) = x is continuous too (choose 5 = e in Definition 3.2). As a 
consequence of Theorem 3.4, all polynomials P{x) = ao + ( + x + • • • + a n x n 
are continuous, and rational functions R{x) = P(x)/Q(x) are continuous at all 
points xq, where Q(x o) f 0. 
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The Intermediate Value Theorem 

This theorem has been known for a long time . . . 

(Lagrange 1807, Oeuvres vol. 8, p. 19, see also p. 133) 

This theorem appears geometrically evident and was used by Euler and Gauss 
without scruples (see quotation). Only Bolzano found that a “rein analytischer 
Beweis” was necessary to establish more rigor in Analysis and Algebra. 

(3.5) Theorem (Bolzano 1817). Let f : [a,b] —> M be a continuous function. If 
f(a)<c and f(b) > c, then there exists £ G ( a , b) such that /(O = c. 

Proof. We shall prove the statement for c = 0. The general result then follows 
from this special case by considering f(x) — c instead of f(x). 

The set X = {x G [a, b) ; f(x) < 0} is nonempty (a G X ) and it is 
majorized by b. Hence, the supremum £ = sup X exists by Theorem 1.12. We 
shall show that /(£) =0 (Fig. 3.2). 

Assume that /(£) = K > 0. We put e = K/ 2 > 0 and deduce from the 
continuity of f(x) at £ the existence of some <5 > 0 such that 
\f(x) — K\ < K/2 for |x — £| < 8. 

This implies that /( x) > K/2 > 0 for £ — <5 < x < £, which contradicts the faet 
that £ is the supremum of X . 

We exelude the case /(£) = K < 0 in a similar way. □ 



The Maximum Theorem 

With his theorem, which States that a continuous function of a real variable 
actually attains its least upper and greatest lower bounds, i.e., necessarily 
possesses a maximum and a minimum, Weierstrass created a tool which 
today is indispensable to all mathematicians for more refined analytical or 
arithmetical investigations. 

(Hilbert 1897, Gesammelte Abh. , vol. 3, p. 333) 
The following theorem is called “Hauptlehrsatz” (“Principal Theorem”) in Weier- 
strass’ leetures of 1861 and was published by Cantor (1870). 
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(3.6) Theorem. If f : [o, b] —> K is a continuous function, then it is bounded 
on [a, b] and admits a maximum and a minimum, i.e., there exist u G [a, b] and 
U G [ a , b\ such that 

(3.9) f(u)<f(x)<f(U) forali xG[a,b\. 


Discussion of the Assumptions. The function / : (0, 1] — > R defined by f(x) = 
l/x is not bounded ond = (0, 1]. Therefore, the assumption that the domain A 
be closed is important. 

The function / : [0, oo) — > M, given by f(x) = x 2 , shows that the bounded- 
ness of the domain of f(x) is important. 

The function / : [0, 1] — > M defined by /( 1/2) = 0 and 

f(x) = (x- 1/2)- 2 for æ ^ 1/2 


is discontinuous at x = 1/2 and unbounded. Hence, it is important to assume that 
the function be continuous everywhere. 

Our last example exhibits a function / : [0.1] —* 

M which is bounded, but does not admit a maximum: 


/O) 


_ ( — 3x + sin(l/æ) ifæ^O 
lo if x = 0 . 


The supremum of the set {f(x) \ x G [0, 1]} is equal 
to 1, but there is no U G [0, 1] with f(U) = 1. 



Proof of Theorem 3.6. We first prove that f(x) is bounded on [ a , b], We suppose 
the contrary: 

(3.10) V n > 1 3x n G [a,b] \f(x„)\>n. 

The sequence x±,X 2 ,xs, . . . admits a convergent subsequence by the Bolzano- 
Weierstrass Theorem (Theorem 1.17). In order to avoid writing this subsequence 
with new symbols, we denote it again by x±, X 2 , £3, • . . and we simply say: “after 
extracting a subsequence, we suppose that” lim^oo x n = Since / is contin- 
uous at £, it follows from Theorem 3.3 that lim n ^ 00 / (x n ) = /(O- This contra- 
dicts (3.10) and proves the boundedness of f(x). 

In order to prove the existence of U G [ a,b ] such that (3.9) holds, we consider 
the set Y = {y : y = f(x), a < x < b}. This set is nonempty and bounded (as we 
have just seen). Therefore, the supremum M = sup Y exists. By Definition 1.11 
of the supremum, the value M — s (for an arbitrary e > 0) is no longer an upper 
bound of Y. Taking e = l/n, we thus find a sequence of elements x n G [o, b] 
satisfying 


(3.11) 


M — l/n < f(x n ) < M. 
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Applying the Bolzano-Weierstrass Theorem, after extracting a subsequence, we 
suppose that {a; n } converges and we denote the limit by U = lim^oo x n . Be- 
cause of the continuity of f(x ) at U, it follows from (3.11) that f(U) = M. 

The existence of a minimum is proved similarly. □ 

Monotone and Inverse Functions 

(3.7) Definition. Let A and B be subsets ofM.. Thefunction f : A — » B is 

• injective if f(x i) ^ f(x 2 ) for x±^x 2 , 

• surjective if V y G B Eæ G A f(x) = y, 

• bijective ifit is injective and surjective, 

• increasingif f(x i) < f(x 2 ) for x\ <x 2 , 

• decreasingif f(x i) > f{x 2 ) for x\ <x 2 , 

• nondecreasing if f(x i) < f(x 2 ) for x\ < x 2 , 

• nonincreasing if f(x i) > f(x 2 ) for x\ <x 2 , 

• monotone ifit is nonincreasing or nondecreasing, and 

• strictly monotone ifit is increasing or decreasing. 

Strictly monotone functions are injective. It is interesting that for real contin- 
uous functions, defined on an interval, the converse statement is true, too. 

(3.8) Lemma. // / : [a, b] R is continuous and injective, then f is strictly 
monotone. 

Proof For any three points u <v <w we have 

(3.12) f(v) is between f(u) and f(w). 

Indeed, suppose f(v) is outside this interval and, say, 
doser to f{u). Then there is a £ between v and w with 
f(it) = f(f ) (Theorem 3.5). This is in contradiction 
to the injectivity of /. Therefore, for a < c < d < b 
the only possibilities are 

f(a) < /(c) < f(d) < f(b) or f(a) > f(c) > f(d) > f(b): 

all other configurations of the inequalities contradict (3.12). □ 

Surjectivity of a function / : A —> B implies that every y G B has at 
least one inverse image. Injectivity then implies uniqueness of this inverse image. 
Therefore, a bijective function has an inverse function / -1 : B —> A, defined by 

(3.13) f~ 1 (y) = x <!=>• f{x)=y. 

(3.9) Theorem. Let f : [ a , b] — t [c, d] be continuous and bijective. Then, the 
inverse function f~ x : [c, d\ — » [o, b] is also continuous. 
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Proof. Let {y n } with y n G [c, d] be a sequence satisfying y n = yo- By 

Theorem 3.3, we have to show that lim^oo f^ 1 (y n ) = f~ 1 (yo)- We therefore 
consider the sequence {x n } = {f~ 1 (y n )}- Let {x' n } be a convergent subsequence 
(which exists by the theorem of Bolzano-Weierstrass), and denote its limit by xo. 
The continuity of f(x) at x 0 implies that 

f( x o ) = f(x' n ) = Jhn_ y' n = y 0 , 

and consequently xo = f~ 1 (yo)- Therefore, each convergent subsequence of 
{x n } = {f~ 1 (y n )} converges to f~ 1 (yo)- This point is the only accumulation 
point of the sequence {/ -1 (2/n)} and we have / _1 (t/n) — > / _1 (t/o) (see also Ex- 
ercise 1.8). □ 

Example. Each of the real functions x 2 , x 3 , . . . is strictly monotone on [0, oo) and 
has there an inverse function: yfx, \[x, .... By Theorem 3.9, these functions are 
continuous. 

Limit of a Function 

The concept of the limit of a function was probably first defined with suffi- 
cient rigour by Weierstrass. 

(Pringsheim 1899, Enzyclopådie der Math. Wiss., Band II. 1, p. 13) 
Assume that f(x) is not continuous at xo or not even defined there; in such a 
situation it is interesting to know whether there exists, at least, the limit of f(x) 
for x approaching xq. Obviously, xo has to be close to the domain of /. We say 
that xq is an accumulation point of a set A if 

(3.14) VÆ> 0 3 xgA 0 < |x — ato|<Æ. 

For a bounded interval, the accumulation points consist of the interval and of the 
two endpoints. 

(3.10) Definition. Consider a function / : A — > M and let xq be an accumulation 
point of A. We say that the limit off(x) at xq exists and is equal to yo, i.e., 

(3- 15 ) limj(x)=y 0 

if 

(3.16) Ve>0 35 >0 Vie A : 0< |®-æ 0 | <S \f(x) - y 0 \ < e. 

This definition can be modified to cover the situations xo = ±oc and/or yo = 
±oo (see, for example, Eq. (1.10)). The assumption that xo is an accumulation 
point implies that the set of x G A satisfying 0 < |x — xo | < 5 is never empty. 

With Definition 3.10, the continuity of /(x) at xo can be expressed as follows 
(see Definition 3.2): 
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(3.17) lim f(x) exists and lim f(x) = f(x o). 

Examples. The function of Fig. 3.1b has a limit lim x ^ Xo f(x) that is different 
from f(x o). For the function (3.4), the limit lim x ^ Xo /(x) exists for all xo (see 
Exercise 3.1; remember that the point xq is explicitly excluded in Definition 3.10) 
and lim x ^ Xo f(x) = 0. 

A still weaker property is the existence of one-sided limits. 

(3.11) Definition. We say that the left-sided ( respectively right-sided) limit off{x) 
at xq exists if (3.16) holds under the restriction x < xq (respectively xo < x). 
These limits are denoted by 

(3.18) lim f{x)=yo respectively lim+ f(x) = t/o- 


The functions of Figs. 3.1b, 3.1c, and 3.1dpossess left- and right-sided limits 
(often = these limits do not exist for the functions of Figs. 3. le and 3. lf. 

The following theorem is an analog to Cauchy’s criterion in Theorem 1.8. 

(3.12) Theorem (Dedekind 1872). The limit lim x ^ Xo f(x) exists ifand only if 
(3.19) 

v £ >0 3S>0 V*,®€A: 

Proof. The “only if” part follows from 

I f(x) ~ f(x) I < I f(x) - 2 / 0 1 + 1 2/0 - f(x) I < jfc. 

For the “if” part we choose a sequence {xi} with x t G A which converges to xq . 
Because of (3.19) the sequence {t/i} with t/i = /(x,;) is a Cauchy sequence and, 
by Theorem 1.8, converges to, say, t/o. For an x satisfying 0 < x — xo < S we 
now have, again from (3.19), 

I f(x) ~ 2/o | < I f(x) - f(xi ) | + | f(xi) - t/ 0 | < 2 '■£, 

for i sufficiently large. □ 

Analoguous results hold for the situation where xq — ,±oc or for one-sided 
limits. 

Exercises 


3.1 Show that the function (3.4) is continuous at all irrational Xq and, of course, 
discontinuous at rational xq. 
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Hint. If you have difficulties, set xq = \/2 — 1 and s = 1/10 and determine 
for which values of x you have f(x) > s. This gives you a S for which the 
statement in Definition 3.2 is satisfied. 

3.2 (Pringsheim 1899, p. 7). Show that Dirichlet’s function (3.3) can be written 


f(x)= lim lim | cos(n!7ra;)| m . 


3.3 Compute the limits 


Remember that (y/a - VE) (y/a + VE) = a-b. 

3.4 Show: if / : [a, b] — »■ [c, d] is continuous at Xq , and g : [ c,d\ — > [«, u] is 
continuous at yo = /(a;o), then the composite function (gof)(x) = g(f(x)) 
is continuous at Xq. 

3.5 Here is a list of functions / : A — > M, 


1) 

f( x ) 

= x ■ sin(l/a;) — 2x 

A = [0, 0.2] 

2) 

f( x ) 

= x/(x* + l) 

A = [-4, +4] 

3) 

the same 

A = (-oo,+oo) 

4) 

f( x ) 

= (1/Vsina;) — 1 

A = (0, 7r) 

5) 

the same 

A = [0,7r] 

6) 

f( x ) 

= yfx ■ sin(a; 2 ) 

A = [Oj 7] 

7) 

the same 

A = [0, oo) 

8) 

f( x ) 

= arctan((a: — 0.5) / (ar 2 

-0.1® -0.7)) A= [-1.5, 1.5] 

9) 

f( x ) 

= sin(x 2 ) 

A= [-5,5] 

10) 

the same 

A = (—oo, oo) 

11) 

f( x ) 

= Vx 

A= [-1,1] 

12) 

the same 

A = (—oo, oo) 

13) 

f( x ) 

= cos x + 0.1 sin(40x) 

A= [-1.6, 1.6] 

14) 

/O) 

= *“[*] 

A = [0, 3] 

15) 

f( x ) 

= yfx ■ sin(l/æ) — 2y/x 

A= [0,0.1] 

16) 

f( x ) 

= 3 l/V x (^ ~ x ) 

A = (0, 1) 

17) 

f( x ) 

= sin(5/a;) — x 

A = [0, 0.4] 


where [x] denotes the largest integer not exceeding x. Whenever the above 
definitions for f(x) do not make sense (for example when a certain denomi- 
nator is zero), set f(x) = 0. Decide which of these functions are graphed in 
Fig. 3.3. 
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3.6 Which of the functions of Exercise 3.5 are continuous on Al What are the 
points of discontinuity? 

3.7 Which of the functions of Exercise 3.5 possess a maximum value on A; which 
possess a minimum value on Al 
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TTT -4 Uniform Convergence and Uniform Continuity 

The following theorem can be found in the work of Mr. Cauchy: “If the 
various terms of the series uo + u\ + U2 + . . . are continuous functions, . . . 
then the sum s of the series is also a continuous function of x But it seems 
to me that this theorem admits exceptions. For example the series 
sinæ— isin2®+is in3 *--- 
is discontinuous at each value (2 m + l)n of x, . . . 

(Abel 1826, Oeuvres, vol. 1, p. 224-225) 

The Cauchy-Bolzano era (first half of 19th century) left analysis with two im- 
portant gaps: first the concept of uniform convergence, which clarifies the limit of 
continuous functions and the integral of limits; second the concept of uniform con- 
tinuity, which ensures the integrability of continuous functions. Both gaps were 
filled by Weierstrass and his school (second half of 19th century). 

The Limit of a Sequence of Functions 

We consider a sequence of functions /i, fi-, fs, ■ ■ ■ : A —> R. For a chosen x G A 
the values fi(x), fAx), f:i(x). . . . are a sequence of numbers. If the limit 

(4.1) Jhm J n (x)=f(x) 

exists for all x G A, we say that {f n (x)} converges pointwise on A to f(x). 

Cauchy announced in his Cours (1821, p. 131; Oeuvres II.3, p. 120) that if 
(4.1) converges for all x in A and if all f n (x) are continuous, then f(x) is also 
continuous. Here are four counterexamples to this assertion; the first one is due to 
Abel (1826, see the quotation above). 

Examples. 

a) (Abel 1826, see the upper left picture of Fig. 4.1) 


„ s „ , . . sin 2a; sin 3a; sin 4a; , sin na; 

(4.2a) /„(*)= sim- — + - — + 

Fig. 4.1 shows fi(x), fAx), (x) and f ioo(x). Apparently, {/„( a;)} converges to 
the line y = x/2 for — 7 r < x < 7 r (this can be proved using the theory of Fourier 
series), but f n (n) = 0 and for 7r < x < the limit is y = x/2 — n. Thus, the 
limit function is discontinuous. 

b) (upper right picture of Fig. 4.1) 

(4.2b) f n (x) = x n on A = [0, 1], lim f n (x) = | ^ X < | 

c) (lower left picture of Fig. 4.1) 



Jhm J n (x) 


-1 \x\ < 1 

0 x = 1 

+ 1 X > 1. 


(4.2c) f n (x) 
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FIGURE4.1. Sequences of continuous functions with a discontinuous limit 


d) (lower right picture of Fig. 4.1) 

(4.2d) f n {x) = (1 — x 2 ) n on 4 = [—1, 1], lim f n (x) = | ® ^ ^ 0 

Another example, which we have already encountered, is f n (x) = arctan(næ) 
(see (3.2)). 



FIGURE 4.2. Sequence of uniformly convergent functions 


Explanation (Seidel 1848). We look at the upper right picture of Fig. 4.1. The 
closer x is chosen to the point x = 1, the s lower is the convergence and the 
larger we must take n in order to obtain the prescribed precision e. This allows 
the discontinuity to be created. We must therefore require that, for a given e > 0, 
the difference f n {x) — f(x) be smaller than efor all x G A, if, of course, n> N 
(see Fig. 4.2). 
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(4.1) Definition (Weierstrass 1841). The sequence f n : A — > M converges uni- 
formly o n A to f : A — > M if 


Me> 0 3N >1 Mn>N Mx G A \f n (x) - f{x)\ < e. 


In this definition, it is important that N depends only on e and not on x 6 A. 
This is why “V x e A” stands after “3 N > 1” in (4.3). 

As in Sect. III. 1 (Definition 1.7), we can replace /( x) in (4.3) by all succes- 
sors of f n (x). We then get Cauchy’s criterion for uniform convergence: 

(4.4) Ve > 0 3N>1 Mn> N Mk > 1 Mx € A lf n (x)-f n+k (x)l < s. 


(4.2) Theorem (Weierstrass’s leetures of 1861). If f n : A —> M are continuous 
funetions and if f n (x) converges uniformly on A to f(x), then f : A — > R is 
continuous. 



Åx) 

AC*> 

fM 


Kx o) 


FIGURE4.3. Continuity of fix) 


Proof The idea is to decompose f(x) - f(x o) “in drei Theile si £2 £3” and then to 
use an estimate for f n (x) — f n {x 0), and the estimate (4.3) twice (see Fig. 4.3). For 
a given £ > 0 we choose N such that (4.3) is satisfied. Since the funetion fj y(x) 
is continuous, there exists a 6 > 0 such that |/jv(ar) — /jv(®o)| < £ whenever 
\x — .x’o <5. With the triangle inequality, we thus get for \x — xq\ <6 

\f(x)-f(x 0 )\ < \f(x)- f N (x)\^ + \f N (x)- f N (xo)[ + \f N (xo)-f(xo)\^ < 3e, 
< £ < £ < £ 

which is arbitrarily small. □ 


Question. Is there a sequence of continuous funetions f n (x) that converges to a 
continuous funetion f(x) such that the convergence f n (x) — > f(x) is not uni- 
form? As we have seen above, uniform convergence is a necessary hypothesis 
for Theorem 4.2, but it might not be necessary for a particular example. For the 
history of this problem, which occupied many mathematicians between 1850 and 
1880 with numerous attempts and a wrong “proof”, see G. Cantor (1880). 
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First Example (similar to Cantor’s): 


(4.5) 


fn(x) = 


2 nx 

1 + n 2 x 2 


It can easily be seen that f n {x) = 0 for any fixed x f 0. The functions 

f n ( x ) possess a maximum of height y 1 at x = l/n (see the left-hand picture of 
Fig. 4.4), so the convergence is not uniform. The point is, however, that for x = 0 
all functions f n (x) are 0. So we have convergence here also, and the limiting 
function is continuous. 


The second example is of a similar nature and still easier to understand (right-hand 
picture of Fig. 4.4): 


(4.6) 



For a third example see Exercise 4.1. 


0<x<l/n 
l/n< x < 2/n 
2/n < x. 



FIGURE4.4. Nonuniform convergence to a continuous limit 


Weierstrass’s Criterion for Uniform Convergence 

We now consider the important case where the functions are partial sums 

(4.7) s n (x) = ^Tfi{x) 


with real functions /, : A — > M. We call the series 
(4.8) f ]fi(x) 

i= 0 

uniformly convergent on A, if the sequence (s„(x)} of (4.7) converges uniformly 
on A. 
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(4.3) Theorem (Weierstrass’s Criterion). Let 

(4.9) \fn(x)\<c n for all x G A 

and let Cn a convergent series of numbers; then the series (4.8) con- 

verges uniformly on A. 

Proof It is clear from (4.9) that c n > 0. We further have 

\s n+k (x) - S„(x)| = I fn+k(x) + • • • + fn+l(x)\ 

< \fn+k(x)\ + • • • + \fn+l(x)\ < C n+k + . . . + C n+ 1 < £. 

The last estimate holds for n > N and all fe > 1, because, by hypothesis, the 
series X] c n converges. The assertion now follows from Cauchy’s Criterion (4.4). 

□ 


Examples. a) Since | sin(mx)| < 1 and X V™ 2 is convergent, the series (3.7) 
converges uniformly on R and represents a continuous function. On the other 
hånd, Abel’s example (4.2a) needs the divergence of the series 1 + 1/2 + 1/3 + 
l/4 + 1/5 + ...in order that the limit function be discontinuous. 
b) The series for the exponential function, 


e x = l + x+ — + — 4 


converges for all x € M, but does not converge uniformly on M (see Fig. 1. 2. 6b). 
In order to apply our theorem nevertheless, we choose a fixed u and consider 
A = [—u,u\. Since we know that X^Lo u ”/ n - conver g es and since \x n /n\\ < 
u n /n\ for \x\ < u, we conclude from Theorem 4.3 that the series (4.10) converges 
uniformly on each closed interval {—u. u] . Since u was arbitrary, we obtain that 
e x is continuous for all ifl. 


Uniform. Continuity 

It has apparently not yet been observed, that . . . continuity at any single 
point ... is not the continuity . . . which can be called uniform continuity, 
because it extends uniformly to all points and in all directions. 

(Heine 1870, p. 361) 

The general ideas of the proof of several theorems in §3 according to the 
principles of Mr. Weierstrass are known to me by oral Communications 
from himself, from Mr. Schwarz and Mr. Cantor, so that . . . 

(Heine 1872, p. 182) 

Definition 3.2 for the continuity of a function / : A — > M ensures for each xo G A 
and each e > 0 the existence of a <5 >0 such that the variation | f(x) — f(x o)| 
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FIGURE4.5. Nonuniformly continuous functions (a) and (b), uniformly continuous (c) 


is bounded by £ if \x — xq\ is bounded by S. The problem is that this S is not 
necessarily the same for all xq G A. 

Examples. Fig. 4.5 shows the graphs of y = 1/x for A = (0, 1] and of y = x 2 
for A = [0, oo). In both cases, it can be observed that the 6, which is necessary 
to ensure that \f(x) — f(x o)| < e for a given e, tends to zero, in the first case 
for xo — > 0, in the second case for xq — > oo. On the contrary (Fig. 4.5c), for the 
function y = sjx on A = [0, 1], in spite of the infinite slope of the curve at the 
origin, there is a smallest d m j n = c 2 , which is positive. This <5 m ; n , though usually 
unnecessarily small, can be used throughout the whole interval A = [0, 1], We 
call this property uniform continuity, a notion that emerged slowly in leetures of 
Dirichlet in 1854 and of Weierstrass in 1861. The first publication is due to Heine 
(1870, p. 353). 

(4.4) Definition. A function f : A — > M is uniformly continuous on A if 
V s > 0 36 > 0 xo G A V x G A \x — xq\ < 6 \f(x) — /(a;o)| < e. 

Remark. The uniform continuity of a given function can often be shown using 
Lagrange’s Mean Value Theorem (see Theorem III.6.1 1 below), 

(4.11) f{x) - f{x 0 ) = f'(£){x - xo). 

If A is an interval and / differentiable in A with 

(4.12) M = sup |/'(0| < oo, 

CeA 

then, for a given s, we satisfy the condition of Definition 4.4 by simply putting 
6 = e/M (see also Exercise 4.3 below). However, differentiability is by no means 
necessary, and we have the following astonishing theorem. 
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(4.5) Theorem (Heine 1872). Let Abe a closed interval [a, b] and let thefunction 
f : A— t-M.be continuous on A; then f is uniformly continuous on A. 

First Proof { after Heine 1872, p. 188). We assume the negation of the condition in 
Definition 4.4 and choose S = l/n for n = 1,2,.... This yields 

(4.13a) Hg > 0 VI /n > 0 3xo n € A 3x n G A : \x n — xo n \ < l/n 
(4.13b) suchthat \f(x n ) — f(xo n )\ > £• 

After extracting a convergent subsequence from {x n } (which we again denote by 
{x n }\ see Theorem 1.17), we have lim^oo x n = x, and since \x n — xo n \ < l/n 
we also have lim^-xx, xo n = x - Since / is continuous, we have (see Theorem 3.3) 

lim f(x n ) = f(x) = lim f(x o«), 

in contradiction with (4. 13b). □ 

Second Proof (Liiroth 1873). Let an g > 0 be chosen. For each x G [ a,b ] let 
S(x) > 0 be the length of the largest open interval I of center x such that \f{y) — 
f(z) | < g for y,z G I. More precisely, 

(4.14) å(x) = sup{<5 > 0 | \/y,z G [x - 6/2,x + 6/2] \f{y)-f(z)\<e} 

(where, of course, the values x, y, and z have to lie in A). By continuity of f(x') 
at x, the set {/5 > 0 | . . .} in (4.14) is nonempty, so that S(x) > 0 for all x G A. 
If 5(x) = oo for some x G A, the estimate \f(y) — f(z) \ < s holds without any 
restriction and any 6 > O will satisfy the condition in Definition 4.4. 


y z x I 



X X+Tl 2ri 

FIGURE4.6. Liiroth’s proof of Theorem 4.3 


If 8(x) < oo for all x G A, we move x to x ± r/. The new interval /' cannot be 
longer than S(x) + 2\r]\, otherwise I would be entirely in /' and could be extended. 
Neither can it be smaller than Æ(ar) — 2|»7|. Thus, this 8(x) is a continuous function. 
Weierstrass’s Maximum Theorem (Theorem 3.6), applied here in its “minimum” 
version, ensures that there is a value xo such that <5(xo) < 8(x) for all x G A. 
This value 8{x<s) is positive by definition and can be used to satisfy the condition 
in Definition 4.4. □ 

Remark. If you are unsatisfied with both proofs above, you can read a third one, 
published by Darboux (1875, p. 73-74), which is based on repeated subdivision of 
intervals. 
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Exercises 

4. 1 Show that the functions 

f n (x) = (n+ l)x"(l - x), x e A = [0, 1] 

converge to zero for all x G A, but possess a maximum at x = n/(n + 1) 
of asymptotic height 1 /e. Therefore, we do not have uniform convergence 
despite the faet that the limiting function is continuous. 

4.2 (Pringsheim 1899, p. 34). Show that the series 

a) converges absolutely for all x € M and 

b) does not converge uniformly on [—1,1]. 

c) Compute f(x). Is it continuous? 

4.3 The function / : [0, 1] — > R defined by 

,, , _ f ^ ‘ (sin \ + 2) if 0 < x < 1, 

* _ lo if a; = 0 

is continuous on [0, 1], and should therefore be uniformly continuous. Find 
explicitly for a given s > 0, say s = 0.01, a 8 > 0 for which we have 

Vx i,x 2 e [0, 1] : \xi -x 2 \ <6 - f(x 2 ) \ < £■ 

Hint. Use the Mean Value Theorem away from the origin and a direct esti- 
mate for values close to 0. 

4.4 Which of the functions of Fig. 3.3 (see Exercise 3.5) are uniformly continu- 
ous oni? 
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Our first question is therefore: what meaning should we give to f f(x) dx ? 

(Riemann 1854, Werke, p. 239) 
By one of those insights of which only the greatest minds are capable, 
the famous geometer [Riemann] generalises the concept of the definite 
integral, . . . (Darboux 1875) 

The discussion of the integral in Sects. II.5 and II. 6 was based on the formula 
(5.1) j b f(x)dx = F(b)-F(a), 

where F(x) is a primitive of f(x). We have implicitly assumed that such a primi- 
tive always exists and is unique (up to an additive constant). Here, we will give a 
precise definition of f{x) dx independent of differential calculus. This allows 
us to interpret / a & f(x) dx for a larger class of functions, including discontinuous 
functions or functions for which a primitive is not known. A rigorous proof of 

(5.1) for continuous / will then be given in Sect. III.6 below. 

Cauchy (1823) described, as rigorously as was then possible, the integral of 
a continuous function as the limit of a sum. Riemann (1854), merely as a side- 
remark in his habilitation thesis on trigonometric series, defined the integral for 
more general functions. In this section, we shall describe Riemann’s theory and 
its extensions by Du Bois-Reymond and Darboux. Still more general theories, not 
treated here, are due to Lebesgue (in 1902) and Kurzweil in 1957. 

General Assumptions. Throughout this section, we shall consider functions / : 
[a, b] R, where [u, b] = {x | a < x < b} is a bounded interval and f(x) is a 
bounded function, i.e., 

(5.2) 3M> 0 \/xe [a,b] \f(x)\ < M. 

Otherwise, the definition of Darboux sums (below) would not be possible. Situa- 
tions that violate one of these assumptions will be discussed in Sect. III. 8. 

Definitions and Criteria of Integrability 

We want to define the integral as the area between the function and the horizontal 
axis. The idea is to divide the interval [a, b] into small subintervals and to approxi- 
mate the area by a sum of small rectangles. A division into subintervals is denoted 
by 

(5.3) D = {x o,xi,X 2 , ■ ■ ■ ,x n } 

(where a = xo < x\ < ... < x n = b) and the length of a subinterval is 
Si = Xi — Xi-i. We then define the lower and upper Darboux sums (see Fig. 5.1) 
by 
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(5.4) s(D) = Mi, S(D) = J2 

i= 1 i= 1 

where 

(5.5) fi = inf /(ar), T 7 ) = sup /(ar). 

Xi-!<x<Xi 

Obviously, we have s(jD) < S(D) and any reasonable definition of the integral 
fa /( ;E ) Fx must give a value between s(D) and S(D). 

A division D' of [a, b] is called a refinement of D, if it contains the points of 
D, i.e., if D' D D. 




s{D) s(D') S(D') S(D) 


FIGURE 5.2. Refinement of a division 


(5.1) Lemma. If D' is a refinement of D, then 

s{D) < s(D') < S(D') < S(D). 


Proof. Adding a single point to the division D increases the lower Darboux sum 
(or does not change it) and decreases the upper sum (or does not change it, 
Fig. 5.2). Repeated addition of points yields the statement. □ 
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(5.2) Lemma. Let Dj and D 2 be two arbitrary divisions, then 
s(Dj) < S(D 2 ). 

Proof. We take D' = Dj U D 2 , the division containing all points of the two 
divisions (points appearing twice are counted only once). Since D' is a refinement 
of Dj and of D 2 , the statement follows from Lemma 5.1. □ 


Lemma 5.2 implies that, for a given function / : [a, b] — » R, the set of lower 
Darboux sums is majorized by every upper Darboux sum (and vice versa): 

s(D) S(D ) 



Therefore (Theorem 1.12), it makes sense to consider the supremum of the lower 
sums and the infimum of the upper sums. Following Darboux (1875), we introduce 
the notation 

(5.7) J f(x)dx = 'miS(D) 

(5.8) J f(x) dx = sup s(D) 

(5.3) Definition. A function f : [a, b] — > M, satisfying (5.2), is called integrable (in 
the sense of Riemann), ifthe lower and upper integrals (5.7) and (5.8) are equal. 
In that case, we remove the bars in (5.7) and (5.8) and we obtain the “Riemann 
integral”. 


the upper integral, 
the lower integral. 


(5.4) Theorem. A function / : [a, 6 ] — > R is integrable ifand only if 
(5.9) Ve > 0 3D S(D) - s(D) < e. 


Proof. By definition, the function f(x) is integrable if and only if the two sets 
in (5.6) are arbitrarily close. This means that, for a given e >_0, there exist two 
divisions Dj and D 2 such that S(D 2 ) — s(Dj) < e. Taking D = Dj\J D 2 and 
applying Lemma 5.1 yields the statement. □ 


(5.5) Example. Consider the function f(x) = x on an interval [ 0 , 6 ]. For the 
equidistant division D n = {x, = a + ih\i = 0, 1, ... , n, h = (b — a)/n}, we 
obtain from (1.1.28) that 


«( A.) = ‘ N “ x i-i) = j - y 


0 b-a ) 2 

2 n 


S(D n ) = ^2 Xi ' ( X i ~ X i~ l) 


b 2 a 2 ( b — a) 2 

~2 ~ T + 2 n 
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so that S(D n ) — s(D n ) = (b — a) 2 /n. For sufficiently large n this expression 
is smaller than any e > 0. Therefore, the function is integrable and the integral 
equals 6 2 /2 — o 2 /2. 

(5.6) Example. Dirichlet’s function / : [0, 1] — > M, defined by (see (3.3)) 

. . f 1 x rational 

f[x) = < 

l 0 x irrational, 

is not integrable in the sense of Riemann, because in every subinterval there are 
rational and irrational numbers so that /, = 0 and f 7 ) = 1 for all i. Consequently, 
s(D) = 0, S(D) = 1 for all divisions. 

(5.7) Example. The function / : [0, 1] — > R (see (3.4)) 

{ 0 x irrational or x = 0 
1 /q x = p/q reduced fraction 

is discontinuous at all positive rational x. However, for a fixed £ > 0, only a finite 
number (say k) of x-values are such that f(x) > e. We now choose a division D 
with max. Si < e/k, such that the x-values for which /(x) > £ lie in the interior 
of the subintervals. Because of /(x) < 1, this implies 

S(D) < e + k ■ maxÆj < 2e. 

Since s(D) = 0, we see that our function is integrable and that [J f(x) dx = 0. 


The Theorem of Du Bois-Reymond and Darboux. 

I feel, however, that the manner in which the criterion of integrability was 
formulated leaves something to be desired. 

(Du Bois-Reymond 1875, p. 259) 


(5.8) Theorem (Du Bois-Reymond 1875, Darboux 187 5). A Junction f{x), satis- 
fying (5.2), is integrable ifand only if 

V £ > 0 35 >0 VDgDj S(D)-s(D)<e. 

Here, Dg denotes the set of all divisions satisfying max., S t < S. 


Proof. The “if” part is a simple consequence of Theorem 5.4. The difficulty of the 
“only if ” part resides in the faet that the division D, about which we know nothing 
but max. Si < 5, can be quite different from the D of Theorem 5.4. 

Let £ > _0 be fixed and let D be a division satisfying (5.9), i.e., the shaded 
area A = S(D) — s(D) in Fig. 5.3a is smaller than e. The important point is that 
D = |xq, xi , .. . . , Xfj} consists of a finite number of points. Now take an arbitrary 
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FIGURE5.3. Du Bois-Reymond and Darboux’s proof 


division D g D$ (see Fig. 5.3b) and set A = S(D) — s(D). We have to prove that 
A becomes arbitrarily small if S — * 0. 

Consider the union D' = D U D of the two divisions and set A' = S(D') — 
s(D') (see Fig. 5.3c). The Darboux sums for D' and D are equal everywhere, 
except on intervals that contain points of D (Fig. 5.3d). Since we have at most 
n — 1 such intervals, since their length is < 6, and since — M < f(x) < M, we 
have 

(5.10) A< A' +2(n-l)6M. 

Together with A' < A < e (observe that D' is a refinement of D), this estimate 
yields A <2e provided that 6 < e/ (2 (n — 1 )M). □ 
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Riemann Sums. Consider a division (5.3) and let £ 1 , £ 2 , . . . , f n be such that xo < 
£1 < x\ < £2 < X 2 < £3 < . . . . Then, we call 

(5.11) * = £/(&) -Si 

a Riemann sum. Because of (5.5), we have /* < /(£,) < F. t , so that s(D) <a< 
S(D). Theorem 5.8 thus implies that 

n rb 

(5.12) y; /(£») ■ Sj — ► / f(x)dx if rn.ax.8i — > 0, 

provided that / : [a.b] —> M is an integrable function. 

Riemann sums are very convenient for proving properties of the integral. For 
example, the limit max, <5, — » 0 of the trivial identity 

+ c 2 / 2 (&)) • Si = Cl y /rfe) • Si + c 2 y / 2 fe) • ^ 

leads to (II.4.13), if the functions involved are integrable. 

Integrable Functions 

Let us investigate which classes of functions are integrable. 

(5.9) Theorem. Let f and g be two integrable functions on [o, b] and let X be a 
real number. Then the functions 

f + 9, A •/, f-g, l/l, f/g (if \g(x)\ >C>0) 
are again integrable. 

Proof. We shall use throughout the proof the faet that F, — /) represents the least 
upperbound for the variations of f(x) on x»], i.e., 

(5.13) sup \f{x)~ f{y)\=Fi- F. 

Indeed, suppose that e > 0 is a given number. By the definition of F t and /i, there 
exist £,ri G , x,] such that f(f ) > Fi — s, f{rj) < /, + e and therefore 
/(^) — f (r]) > Fi — fi — 2e. Consequently, F t — f t is not only an upper bound 
for | f(x) — f(y) |, but also the least upper bound. 

a) Let h(x) = f(x) + g(x), and denote by F. t , Gi, Hi, respectively, fi, g-i, 
hi, the supremum, respectively, infimum of f, g, h, on [xi-\,Xi] (see (5.5)). We 
then have for x, y G 1 , x ,] , using the triangle inequality and (5.13), 

I h(x) - h(y ) | < | f(x) - f(y)\ + \g(x) - g(y)\ 

<(Fi-fi) + (Gi-gi). 


(5.14) 
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Thus, Eq. (5.13), applied to the function h, shows that (H, — hi) < ( — /)) + 
(Gi — gi), and the differences of the upper and lower Darboux sums satisfy 

(5.i5) ^(//, - hi)Si < - m + - &)*■ 

For a given e > 0 we choose a division D (Theorem 5.4) such that each term in 
the sum on the right side of (5.15) is smaller than e (in faet, we have two different 
divisions for / and g, but by taking their union we may suppose that they are the 
same). Consequently, Ylifåi ~ hi) Si < 2e and the function h(x) = f(x) + g(x) 
is integrable by Theorem 5.4. 

b) The proofs of the remaining assertions are very similar. For example, for 
h(x) = X ■ f(x) we use 

\h(x) - h(y)\ = |A| • \f(x) - f(y)\ 

instead of (5. 14), conclude that (Hi — hi) < \ X\ ■ (F t — fa), and deduce integrability 
as above. 

For the product h(x) = f(x) ■ g(x) we use 

I h(x) - h(y ) | < |/(a:)| • \g(x) - g(y)\ + \g(y)\ ■ \f(x) - f(y)\ 

< M ■ | g(x) - g(y)\ + N ■ \f(x) - f(y)\ 


(both funetions f(x) and g(x) are bounded by assumption (5.2)). 

Finally, for the last assertion it suffices to pro ve that 1 /g(x) is integrable 
(because f(x)/g(x) = f(x)- (l/g(x)). We set h(x) = l/g(x) andreplace (5.14) 
by 


I h(x) - h(y) | = 


\g(y)-g(x)\ 

\g(y)\ ■ \g{x)\ 


\g(x)-g(y)\ 
C 2 


□ 


Since the constant function and f(x) = x are integrable (Example 5.5), the 
above theorem implies that polynomials and rational funetions (away from sin- 
gularities) are integrable. The following theorem was asserted by Cauchy (1823), 
but was proved rigorously only some 50 years later with the notion of uniform 
continuity. 

(5.10) Theorem. //'/ : [a, b] M is continuous, then it is integrable. 

Proof. The essential point is that / is uniformly continuous (Theorem 4.5). This 
means that for a given s > 0 there exists a S > 0 such that 

\x-y\<S => \f(x) - f(y)\ < s. 

We take a division D satisfying max, <5, < S. For x, y G [xi-t , x,] we thus have 
| f(x) - f(y) | < s and, by (5.13), F* - < e. This implies that S(D) - s(D) = 

Y^ r ’-i(Fi - fi)Si < i hi = e(b - a) and the integrability of f(x) follows 
from Theorem 5.4. □ 
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(5.11) Theorem. If f : [a, b] —> IR. is nondecreasing (or nonincreasing), then it is 
integrable. 

Proof The smallest value of a nondecreasing function is at the left end point and 
the largest at the right end point of the interval [xj_i, æ*]. Hence, /, = /(xj_i), 
Fi = f(xi) so that fi + 1 = Fj for i = 1 ..... n - I . The idea is now to consider 
equidistant divisions where the length of all subintervals is equal to S. We then 
have 

= F 1 å-f 1 6+F 2 å-f 2 å+F 3 å-f 3 6+. . . = (f(x n )-f(x 0 ))-6 < s, 
if S is sufficiently small. This proves the integrability of f(x). □ 

(5.12) Remark. If we change an integrable function at a finite number of points, 
the function remains integrable and the value of the integral does not change. This 
is seen by an argument similar to that of Example 5.7 above. 

(5.13) Remark. Let a <b < c and assume that / : [a, c] — > M is a function whose 
restrictions to [a, b] and to [b, c] are integrable. Then / is integrable on [a, c] and 
we have 

(5.16) J f (x) dx = J f (x) dx + J f(x)dx. 

This holds because adding the Darboux sums for the restrictions to [a, b] and [ b , c] 
yields a Darboux sum for [a, c]. 

For a > b or a = b we define 

(5.17) J f(x)dx = — J f(x)dx and J f(x)dx = 0, 
so that Eq. (5.16) is true for any triple ( a , b, c). 


Inequalities and the Mean Value Theorem 

The following inequalities are often useful for estimating integrals. We have al- 
ready used them in Sect. II. 10 to obtain the estimates (11.10.15). 

(5.14) Theorem. If f(x) and g(x) are integrable on [a. b] (with a < b) and if 
f{x) < g(x)forall x G [a, b], then 

J f(x) dx < j g(x) dx. 

Proof. The Riemann sums satisfy Y^i = i < Ya = i because å t > 0. 

For max, 8 t 0 we obtain the above inequality (see (5.12) and Theorem 1.6). 
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(5.15) Corollary. For integrable functions we have 

| J f (x) dx | < J \f(x)\dx. 

Proof. We apply Theorem 5. 14 to -\f(x)\ < f(x) < \f(x)\. □ 

By applying Corollary 5.15 to a product of two integrable functions f(x) ■ 
g(x) and using \f(x) ■ <?(a;)| < M ■ |< 7 (ar)|, where M = sup æ£ [ a() ] |/(ar)|, we 
obtain the following useful estimate: 

(5.18) | J f(x)-g(x)dx |< sup^ \f(x)\ ■ J \g{x)\dx. 

The next inequality is similar to (5.18), but treats the two functions / and g sym- 
metrically. 

(5.16) The Cauchy-Schwarz Inequality (Cauchy 1821 in R r \ Bunyakovski 1859 
for integrals, Schwarz 1885, §15, for double integrals). For integrable functions 
f(x) and g(x) we have 



Proof. By Theorem 5.9, we know that / • g, f 2 , and g 2 are integrable. Using 
Theorem 5.14 and the linearity of the integral, we have 

0 < J (/(ar) - 7 p(ar)) dx 

= j f 2 (x) dx — 2^ J f(x)g(x)dx + 'y 2 J g 2 (x)dx. 

We put A = / Q b f 2 (x) dx, B = J ^ f(x)g( x) dx, C = J ^ g 2 {x) dx, and we see 
that A — 2-y B + 7 2 C' > 0 for all real 7. For C = 0 this implies that B = 0. For 
C 7^ 0 the discriminant of the quadratic equation cannot be positive (see (1. 1 .12)). 
Therefore, we must have B 2 < AC, which is (5. 19). □ 


(5.17) The Mean Yalue Theorem (Cauchy 1821). If / : [a, b] —> R is a continu- 
ous function, then there exists £ G [a, b\ such that 

(5.20) £ f(x)dx = m-(b-a). 

Proof. Let m and M be the minimum and the maximum of f{x) on [a, b] (see 
Theorem 3.6), so that m < f(x) < M for all x e [o, b\. Applying Theorem 5.14 
and dividing by ( b — a) yields 
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to < ^ f(x)dx<M. 

The value f(x) dx/(b — a) lies between to = f(u) and M = f(U). Therefore, 
by Bolzano’s Theorem 3.5, we deduce the existence of £ 6 [a, b] such that this 
value equals /(£). This proves Eq. (5.20). □ 

(5.18) Theorem (Cauchy 1821). Let f : [a, 6] — > M be continuous and let 
g : [a,6] — > K be an integrable function that is everywhere positive (or every- 
where negative). Then, there exists £ G [a, b] such that 

(5.21) J f(x)g(x) dx = /(£) J g(x) dx. 

Proof. Suppose that g(x) > 0 for all x (otherwise replace g by —g). In this situa- 
tion, we have 


m ■ g(x) < f(x)g(x) < M ■ g(x) for x G [o, b], 

where m and M are the minimum and maximum of f(x). The rest of the proof is 
the same as for the Mean Value Theorem. □ 


Integration of Infinite Series 


Until very recently it was believed, that the integral of a convergent se- 
ries ... is equal to the sum of the integrals of the individual terms, and 
Mr. Weierstrass was the first to observe . . . 

(Heine 1870, Ueber trig. Reihen, J. f. Math., vol. 70, p. 353) 

On several occasions we found it useful to integrate an infinite series term by 
term (e.g., in the derivation of Mercator’s series (1.3.13) and in the examples of 
Sect. II. 6). This means that we exchanged integration with a limit of functions. We 
will discuss here under what conditions this is permitted. 

First Example. Let n, ri, n, . . . be a sequence containing all rational num- 
bers between 0 and 1, for example 

1 1 2 1 2 3 1 2 3 4 1 

2 ' 3 : 3 4 4 4 : 5 : 5 ’ 5 ’ 5 6 ' 


We then define 


(5.22) 


/„(*) = (' if ^ 

l 0 else. 


By Remark 5.12, each function f n : [0, 1] — > M is integrable with integral zero. 
However, the limit function f(x), which is Dirichlet’s function of Example 5.6, is 
not integrable. (The Lebesgue integral will get rid of this difficulty.) 
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Second Example. The graphs of the functions 

(n 2 x 0 < ar < l/n 

(5.23) f n (x) = < 2n-n 2 x l/n < ar < 2/n 
[o 2/n < x <2 

are triangles with decreasing bases and increasing 
altitudes with the property that 

/„(ar) dx = 1 for all n. 

However, the limit function is f{x) = 0 for all 
x G [0, 1], Here, /(ar) is integrable, but 

lim / /„(ar)dar ^ / lim /„(ar)dar. 

n— >°o J g J Q n—>oo 

(5.19) Theorem. Consider a sequence f„ (x) of integrable functions and suppose 
that it converges uniformly on [a, b] to a function /(ar). Then f : [a, b] — » M is 
integrable and 

r b r b 

lim / f n (x)dx= / f(x)dx. 

Proof Uniform convergence means that, for a given e > 0, there exists an integer 
N such that for all n > N and for all x G [a, 6] we have f n (x) — f(x)\ < e. 
Consequently, we have for all x, y G [ a,b ] that 

\f(x)-f(y)\<\fN(x)-f N (y)\+2s. 

Applying (5.13), we see that 

(F,_ - < (F m - f m ) + 2e, 

where, as in (5.5), we have used the notation F^i = sup x ._ 1<æ<x . /n(x) and 
/jv, = inf x ._ 1 < æ < æ( /jv(x). The function /n(x) is integrable, so that for a 
suitable division of [o, b] the difference of the upper and the lower Darboux 
sums, i.e., '/2 1 (Fn 1 — is smaller than e (Theorem 5.4). This implies that 

T,i( F i ~ fi)k < e(l + 2(6 - o)) and f{x) is seen to be integrable. 

Once the integrability of the limit function / (ar) is established, Corollary 5.15 
implies that for n > N 

| J f n (x)dx-J /(ar) darj < J \f n (x) - f(x)\dx < e(b - a). 



This implies the conclusion of the theorem. 
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(5.20) Corollary. Consider a sequence f n (x) of integrable functions and suppose 
that the series Y^=o fn( x ) converges uniformly on [a. h}. Then, we have 

Y2j fn{x)dx = J jrfn{x)dx. □ 




FIGURE5.4. Riemann’s example of an integrable function 


Riemann’s Example. 

Since these functions have never been considered yet, it will be useful to 
start from a particular example. (Riemann 1854, Werke, p. 228) 

Riemann (1854), in order to demonstrate the power of his theory of integration, 
proposed the following example of a function that is discontinuous in every inter- 
val (see Fig. 5.4): 




where B(x) = 


x — (x) ifx^f k/2 
0 if x = k/2 


and (x) denotes the nearest integer to x. This function is discontinuous at x = 
1/2, 1/4, 3/4, 1/6, 3/6, 5/6, . . . , nevertheless, the series (5.24) converges uni- 
formly by Theorem 4.3 and the functions f n (x) are integrable by Remark 5.13. 
Hence, / is integrable. 


Exercises 

5 . 1 For the function 


f{x ) = { 1 

l x 



otherwise 




DI.5 The Riemann Integral 233 


and a given e > O, say e = O.Ol, construct explicitly a division for which 
S(D) — s(D) < e. This will make clear that / is integrable in the sense of 
Riemann. 

5.2 Consider the function f(x) = x 2 on the interval [0, 1], Compute the lower 
and upper Darboux sums for the equidistant division x t = i/n, i = 
0, 1, . . . , n. Conclude from the results obtained that / is integrable. 

5.3 Show that the numerical approximations obtained from the trapezoidal rule 
(see Sect. II.6), 

J f{x) dx « ft ( ^2°) + /(&) + /(&) + /(&) + • — /(£.v 0 + ^2^ ) 

(h= (b — a) /N and 6 = a + ih), as well as for Simpson’s rule (N even), 


£ f(x) dx*^ (/&>) + 4/(6) + 2/(6) + 4/(6) + • • • + /(6v )) , 

are Riemann sums for a certain division D. Therefore, convergence of these 
methods is ensured for N —> oo for all Riemann integrable functions. 

5.4 (Dini 1878, Chap. 13). Show that 


/> 

Jo 


— 2a cos x + a 2 ) dx = 0 fora 2 < 1, 

— 2a cos x + a 2 ) dx = 7rlna 2 fora 2 


by computing Riemann sums for an equidistant division Xi = in/n, with 6 
the left end point æ»_i. The Riemann sums will become the logarithm of a 
product with which we are familiar (see Sect. 1.5). 

5.5 Let / : [a, b] — » M satisfy i) / is continuous, ii) Vx G [a, b] we have 
f(x) > 0, and iii) 3xq G (o, b) with f(x o) > 0. Then, show that 


(5.25) 


J f(x) dx > 0. 


Show with the help of counterexamples that each of the three hypotheses i), 
ii), and iii) is necessary for proving (5.25). 

5.6 Compute the integrals 


■tt/2 


2 • 4 • 6 • . . . • 2n 
2 • 4 • 6 • . . . • 2n 


Jo 3 • 5 • 7 • . . . • (2n + 1 ) ’ 

Then, use 0 < sinx < 1 for 0 < x < n/2 and Theorem 5.14 to establish 
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The above values inserted into these inequalities lead to a proof of Wallis’s 
product (1.5.27) with a rigorous error estimate. 

5.7 Show that 

^ f x 4 (l — x) 4 dx < f — j- — Y~dx< j x i {l — x) i dx. 

2 i o J o 1 % J o 

The actual computation of these integrals leads to an amusing result (old 
souvenirs from Sect. 1.6). 

Hint. To calculate /'] x 4 ( 1 — x ) 4 dx see Exercise II.4.3. 

5.8 Show that the series 

1 0 4 6 8 

1+x 2 = 1 ~ x + x ~X° + X° - ... 

converges uniformly on A = [— b , 6] for each b with 0 < b < 1. Hence, this 
series can be integrated term by term on A = [0, b] (or on A = [—b, 0]) and 
leads to the well-known series for arctan b. 



FIGURE 5.5. Exchange of lim and integral 


5.9 For the following sequences of functions f n : [0, 1] — > R (Fig. 5.5), 

nx n 2 x 

a) fn(x) = ^ n 2 x 2'j2 ’ b) fn(x) = ^ ^ n ^ x 2 ) 2 ’ 

compute lim^oo f n (x) (distinguish the cases x = 0 and x ^ 0). Find the 
maximal point of f n {x) and decide whether convergence is uniform. Finally, 
check whether the following equality holds: 
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III. 6 Differentiable Functions 


. . . rigor, which I wanted to be absolute in my Cours d’ analyse , . . . 

(Cauchy 1829, Legons) 

The total variation f(x + h) — /( x) . . . can in general be decomposed into 
two terms . . . (Weierstrass 1861) 

The derivative of a function was introduced and discussed in Sect. II. 1. Now that 
we have the notion of limit at our disposal, it is possible to give a precise definition. 


(6.1) Definition (Cauchy 1821). Let I be an interval and let xo € I. The function 
f : I — > M is differentiable at xq if the limit 


(6.1) 


f(x o) = lim 


f(x ) ~ /(ato) 

X — Xo 


exists. The value of this limit is the derivative of f at xø and is denoted by f'(x o). 

If the function f is differentiable at all points of I and if f : I —* R is 
continuous, then f is called continuously differentiable. 


Sometimes it is advantageous to write x = xq + h, so that 


(6.2) 


f'( x o) = lim 


f(x o + h) - f(x q) 
h 


One can also, for a given xo, consider the function r : I — > M defined by r(x o) = 
0 and 

(6.3) r(x) = - f(x o) for x ± x 0 . 

Then, Eq. (6.1) is equivalent to lim x ^ Xo r(x) = 0 and we have the following 
criterion. 


(6.2) Weierstrass’s Formulation (Weierstrass 1861, see the above quotation). A 
function f(x) is differentiable at xq if and only if there exists a number f'(x o) 
and a function r(x), continuous at xq and satisfying r(x o) = 0 , such that 

(6.4) f(x) = f(x 0 ) + f'(x 0 )(x-xo) + r(x)(x-x 0 ). □ 

Equation (6.4) has the advantage of containing no limit (this is replaced 
by the continuity of r(x)) and of exhibiting the equation of tangent line y = 
f(x o) + f(x o)(æ - xo) to /(x) at x = xo. Moreover, it will be the basis for the 
differentiability theory of functions of several variables. 

Still simpler formulas and proofs are obtained, if the two terms in Eq. (6.4) 
are collected by setting 


(6.5) 


ip{x) = f'(x 0 ) + r(x). 



236 III. Foundations of Classical Analysis 


(6.3) Carathéodory’s Formulation (Carathéodory 1950, p. 121). A function f(x) 
is differentiable at xo if and only if there exists a function <p{x), continuous at xo, 
such that 


(6.6) 


f(x) = f(x o) + <p(x)(x - Xo). 


The value <p(x o) is the derivative f'(x o) of / at x 0 . 

We see immediately from (6.6) that if / is differentiable at xo, then it is also 
continuous at xq. Furthermore, since from (6.5) and (6.3) (or directly from (6.6)) 


(6.7) <p( x) = 

is uniquely determined for x 
if it exists. 


f(x) - /(xo) 
X - Xo 


for x ^ xq 


7 ^ xq, the derivative f'(x o) is uniquely determined 


Remarks and Examples. 1. Obviously, the functions f(x) = 1 and f(x) = x are 
differentiable. The differentiability of f(x) = x 2 follows, for example, from (6.6) 
with the identity x 2 - x% = (x + xo){x - xo) (see also Sect. II. 1). 

2. We emphasize that differentiability at xo is a local property. Changing the 
function outside (xo — s, xo+s) for some e > 0 changes neither its differentiability 
at xo nor the derivative f(x o). 

3. If I = [o, b] is a closed interval and xø = a, then (6.1) should be replaced 
by the right-sided limit. 

4. Consider the function f(x) = x (absolute value). At xq > 0, it is 
differentiable with f'(x o) = I ; at xq < 0 it is also differentiable, but with 
derivative f'(x o ) = — 1 . This function is not differentiable at xq = 0 , because 
f(x)/x= \x\/x does not have a limit for x — * 0. 

5. The function 


f(x) = | 


0 

l/q 2 


if x is irrational or integer 
if x = p/q (reduced fraction) 


is discontinuous at every non-integer rational Xq. It 
is, nevertheless, differentiable at xq = 0, since the 
function <p(x) ofEq. (6.6) becomes <p(x) = f(x)/x. 
Since \f(x)\ < \x\ 2 , we have lim x ^ 0 V>(x) = 0 and 
f'(x o) = 0. 



(6.4) Theorem. If f : (o, b) is differentiable at xo G ( a , b) and f'(x o) > 0, 
then there exists 5 > 0 such that 

f(x) > f(x o) for all x satisfying xo < x < xo + S, 
f(x) < f(x o ) for all x satisfying xq — 6 < x < xo- 
If the function possesses a maximum ( or minimum) at xq, then f'(x o) = 0. 

Proof /'(x 0 ) > 0 means that <p(x o) > 0 (see (6.6)). By continuity, <p(x) > 0 in a 
neighborhood of xq. Now the stated inequalities follow from (6.7). 



III.6 Differentiable Functions 237 


If the function possesses a maximum at xo, then we have f(x) < f(x o) on 
both sides of xq. This is only possible if f'(x o) = 0. □ 

(6.5) Remark. The statement of Theorem 6.4 does not imply that a function, sat- 
isfying f'(x o) > 0, is monotonically increasing in a neighborhood of Xq. As a 
counterexample, consider the function f(x) (see Fig. 6.1), given by /(O) =0 and 

f(x) = x + x 2 sin(l/a; 2 ) for x ^ 0. 

It is differentiable everywhere and satisfies /'(O) = 1 (because f(x) = x+r(x)-x 
with |r(x)| < |a;|). For x ^ 0 the derivative 

f'(x) = 1 + 2 æsin^^ 2 ^ - -cos(^) 

oscillates strongly near the origin. Hence, even though f(x) is contained between 
two parabolas, there are points with negative derivatives arbitrarily close to the 
origin. By Theorem 6.4, there exist points < £ 2 , arbitrarily close to 0, for 
which /(£i) > /(&)• 

We shall show later (Corollary 6.12) that, if f'(x) > 0 for all x G (o, b), the 
function is monotonically increasing. Thus, this counterexample is only possible 
because / is not continuously differentiable. 



FIGURE6.1. Graph of the function y = x + x 2 sin(l/a: 2 ) and its derivative 


(6.6) Theorem. If f and g are differentiable at xq, then so are 
f + 9, f-9 , f/9(if9{x o)^0). 

The formulas of Sect. II. 1 for their derivatives are correct. 

Proof. We shall present two different proofs for the product / • g. For f + g and 
f /g the proofs are similar. 
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The first proof is based on the identity 

f(x)g(x) - f(x 0 )g(x o) = ftø 9(x) ~ g(x 0 ) ^ g ^ f(x) - f(x 0 ) 

X — Xo X — Xo X — Xo 

which is obtained by adding and subtracting the term f(x)g(x o). Using the 
continuity of / at xo (Eq. (6.4)), the differentiability of / and g, and Theorem 
1.5, we see that for x —> xo the expression on the right has the finite limit 
f(xo)g'{xo) + g(x 0 )f'(x 0 ). Hence, the product f ■ g is differentiable at xq. 

Our second proof is based on Carathéodory’s formulation 6.3. By hypothesis, 
we have 

(6 8 ) f(x) = f(x 0 ) + <p(x)(x -x 0 ), <p(x 0 ) = f'(x 0 ), 

g(x) = g(x 0 ) + ip(x)(x - x 0 ), tp(x 0 ) = g'(x 0 ). 

We multiply both equations of (6.8), and obtain 

f(x)g( x) = f(x 0 )g(x 0 ) + (f(x 0 )i/;(x)+g(x 0 )tp(x) + ip ■ tp ■ (x - x 0 )^(x - x 0 ) . 

The function in tall brackets is evidently continuous at xo and its value for x = xo 
is f(xo)g'(xo) + g(xo)f'(x 0 ). □ 

(6.7) Theorem (Chain rule for composite functions). Ify = f(x) is differentiable 
at xo and ifz = g(y) is differentiable at yo = f(x o), then the composite function 
(g o f)(x) = g(f{x)) is differentiable at xo, and we have 

(6.9) (g o f)'(x o) = g'{y 0 ) ■ f(x 0 ). 


Many of our students will appreciate the pithy elegance of this 
proof. (Kuhn 1991) 

Proof. We use Eq. (6.6) to write the hypothesis in the form 

f(x) - f(x o) = <p(x)(x - x 0 ), p{x 0 ) = f(x o), 

g{y) - g{yo) = v(y){y - yo), = g'{yo)- 

Inserting y — 'yo = f(x) — f{x o) from the first equation into the second, we obtain 
g(f(x)) ~ g(f(x o)) = ipifix)) <p(x)(x - x 0 ). 

The function il>(f(x)) p(x) is again continuous at xq, and its value for x = Xq is 
g'(f(xo)) ■ f'(x 0 ). □ 

(6.8) Theorem (Inverse functions). Let f : I — > J be bijective, continuous, and 
differentiable at xo € I, and suppose that f'(x o) f 0. Then, the inverse function 
/ _ l :./—>/ is differentiable at yo = f{xo), and we have 
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Proof. In Carathéodory’s formulation (6.6), we have by hypothesis 

f(x) - f(x o) = ip{x)(x - x 0 ), p{x 0 ) = f'(x o), 

we replace x and xo by f~ 1 (y ) and / -1 (t/o)> and f( x ) and f(x o) by y and t/o. 
and get 

y-yo = ^(/ _1 (y)) (/ _1 (y) - / _1 (2/o))- 
From the proof of Theorem 3.9 it follows that f~ 1 (y) is continuous at t/o. Be- 
cause by hypothesis </>(/ -1 (t/o)) f 0, we therefore have p(f~ 1 (y )) ^ 0 in a 
neighborhood of t/o and we may divide this formula to obtain 

f-\y)-f~\yo) = ^ f \ y)) (v-vo)- 

This concludes the proof, since the function 1 /p(f~ 1 {y)) is continuous at t/o- □ 


The Fundamental Theorem of Differential Calculus 

Formula (II.4.6) is the central result of all the computations of Sect. II.4. We shall 
give here a rigorous proof of this result. In particular, we shall show that every 
continuous function f(x) has a primitive, which is unique up to an additive con- 
stant. 

(6.9) Theorem (Existence of a primitive). Let f : [ a,b ] — > M be a continuous 
function. The function 

(6.11) F(x) = j f(t) dt 

(which exists by Theorem 5.10) is differentiable on ( a , b) and satisfies F'(x) = 
/(:e). Hence, it is a primitive of f(x). 

Proof. By Eq. (5.16), we have 

(6.12) F(x)-F(x 0 )= J X 

Applying the Mean Value Theorem 5. 17, we get 

F(x) - F(x 0 ) = f{f)(x - xo), 

where £ = (fx, x 0 ) lies between x and x 0 . For x xq the value (fx, x 0 ) neces- 

sarily tends to xq , so that by continuity of / at xq, we have lirri æ f((f = f(x o). 

This proves (see (6.6)) the differentiability of F(x), with Ffx o) = f(x o). □ 
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Uniqueness of Primitives. 

This was supplied by the mean value theorem; and it was Cauchy’s great 
service to have recognized its fundamental importance. . . . because of this, 
we adjudge Cauchy as the founder of exact infinitesimal calculus. 

(F. Klein 1908, Engl. ed. p. 213) 
See the beautiful proof of this theorem due to Mr. O. Bonnet, in the Traité 
de Calcul différentiel et intégral of Mr. Serret, vol. I, p. 17. 

(Darboux 1875, p. 111) 

Our next aim is to prove the uniqueness (up to an additive constant) of the primi- 
tive. The following concatenation of theorems, which accomplishes this task, has 
been one of the comerstones of the foundations of Analysis since Serret’s book 
(1868; Serret attributes these ideas to O. Bonnet; see the quotations). 

(6.10) Theorem (Rolle 1690). Let f : [a, b] — > M be continuous on [a, b], differ- 
entiable on ( a , b), and such that f(a) = f(b). Then, there exists a£ G (a, b) such 
that 

(6.13) /'(O = «. 


Proof. From Theorem 3.6, we know there exist u,U G [a, b] such that f(u) < 
f(x) < f(U) for all x G [ a , b]. We now distinguish two situations. 

If f(u) = f(U), then f(x) is constant and its derivative is zero everywhere. 
If f(u) < f(U), then at least one of the two values (say f(U)) is different 
from f(a) = f(b). We then have a <U <b, and by Theorem 6.4, f'(U) = 0. 

□ 

(6.11) Theorem (Lagrange 1797). Let f : [a, 6] — > K. be continuous on [a, 6] and 
differentiable on (a, b). Then, there exists a number £ G (a, b) such that 

(6.14) /(&)- /(a) = /'(£)((> -a). 



Proof. The idea is to subtract from f(x) the straight line connecting the points 
(a, f(a)) and (b,f(b)), of slope (f(b) - f(a))/(b - a), and to apply Rolle’s 
Theorem (Fig. 6.2). We define 
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(6.15) h(x) = f(x) - (/(a) + (x-d) • 

Because of h(a ) = h(b) = 0 and 


h'(x) = f'{x) - 


m - /(«) 

b-a 


Eq. (6.14) follows from ft'(£) = 0 (Theorem 6.10). 


□ 


(6.12) Corollary. Let /, g : [o, b] — > M be continuous on [a, b] and differentiable 
on (o, b). We then have 

a) tff'( 0 = 0 for all f G ( a , b), then f(x) = C (constant); 

b) if f(l) = g' (f) for all £ G (o, 6). f/ten /(x) = g(x) + C; 

c ) tf f '{ 0 > 0 for all f G (o, b), then f(x) is monotonically increasing. Le., 
f(x i) < f{xf) for a < x-i < x 2 < b; and 

d) if\f'{£)\ < M for all £ G (a,b), then \f(xi) - f(x 2 )\ < M\xi - x 2 \ for 
xi,x 2 € [o, b ]. 

Proof. Applying Eq. (6.14) to the interval [o, æ] yields statement (a) with C = 
f(a). Statement (b) follows from (a). The remaining two statements are obtained 
from Theorem 6.11 applied to the interval [xi , x 2 ] . □ 


(6.13) The Fundamental Theorem of Differential Calculus. Let f(x) be a con- 
tinuous function on [o, b ]. Then, there exists a primitive F{x) of f(x), unique up 
to an additive constant, and we have 

(6.16) J f{x) dx = F(b ) - F(a). 

Proof. The existence of F{x) is clear from Theorem 6.9. Uniqueness (up to a 
constant) is a consequence of Corollary 6.12b. If F(x) is an arbitrary primitive of 
f{x), then we have F(x) = J a x f(t) dt + C. Setting x = a yields C = F(a), and 
Eq. (6. 16) is obtained on setting x = b. □ 


Fig. 6.3 shows the impressive genealogical tree of the theorems that are 
needed for a rigorous proof of the fundamental theorem. If Leibniz had known 
about this diagram, he might not have had the courage to State and use this theo- 
rem. 

The “Fundamental Theorem of Differential Calculus” allows us to formulate 
theorems of Differential Calculus (Sect. III. 6) as theorems of Integral Calculus 
(Sect. III.5) and vice versa. This faet was exploited in Sect. II.4 on several oc- 
casions. “Integration by Substitution” (Eq. (II.4. 14)) and “Integration by Parts” 
(Eq. (II.4.20)) now have a sound theoretical basis. One has only to require that the 
functions involved be continuous, so that the integrals exist. 
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[Fundamental I hcpr-ctn trl Cah nius] 



Rem. 5.13 I | Thm. 6.4 | | Thm.5.17 | | Thm.4.5 1 

£ = f n + £ | f > 0 =>...[ | Mean Val. | | Unif, Cont. 



FIGURE 6.3. Genealogical tree of the Fundamental Theorem 


The Rules of de L’Hospital 

. . . entirely above the vain glory, which most scientists so avidly seek . . . 

(Fontenelle’s opinion conceming 
Guillaume-Frangois-Antoine de L’Hospital, Marquis de Sainte-Mesme et 
du Montellier, Comte d’Antremonts, Seigneur d’Ouques, 1661-1704) 
Besides, I acknowledge that I owe very much to the bright minds of 
the Bemoulli brothers, especially to the young one presently Professor in 
Groningen. I have made free use of their discoveries . . . 

(de L’Hospital 1696) 

We start with the following generalization of Lagrange’s Theorem 6.1 1. 

(6.14) Theorem (Cauchy 1821). Let f : [a,b] — > E and g : [a,b] -» E be 
continuous on [a, b] and differentiable on (o, b). If g’ (x) f Ofor a < x <b, then 
g(b ) f g(a) and there exists f G (o, b) such that 

m - /(«) _ fw 

g(b) - g(a) g / (£) ' 


(6.17) 
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Proof. We first observe that g(b) ^ g(a) by Rolle’s theorem, since </(£) f 0 for 
all£ G (a. b). We then note that for g{x) = x this result reduces to Theorem 6.11. 
Inspired by the proof of this theorem, we replace (6.15) by 

(6.18) h(x) = f(x) - ^/(a) + (g(x) - g(a )) ^ ) • 

The conditions of Rolle’s Theorem 6.10 are satisfied, and consequently there ex- 
ists £ G ( a , b) with h'(£) = 0. This is equivalent to (6.17). □ 


Problem. Suppose we want to compute the limit of a quotient /(ar) /g(a ;). If both 
functions, /(ar) and g(x), tend to 0 or to 00 when ar — > b, then we are confronted 
with undetermined expressions of the form 

0 00 

o or 00' 

The following theorems and examples show how such situations can be håndled. 


(6.15) Theorem (Joh. Bernoulli 1691/92, de L’Hospital 1696). Let f : (a, b) -> R 
and g : (a. b) — > R be differentiable on ( a , b) and suppose that g'{x) 7 ^ 0 for 
a < x < b. If 

(6.19) lim /(ar) = 0 and lim g(x) = 0 

andif lim f , (x)/g , (x) = A exists, then 


( 6 . 20 ) 


f'{x) 

~ 9'{x) ‘ 


Proof. The existence of the limit of f'{x)/g'{x) for ar — > b— means that for a 
given e > 0 there exists a 6 > 0 such that 

(6.21) for b — 5 < £ < b. 

1 9 ‘ (0 1 

For u. v G (b — S, b) it then follows from Theorem 6.14 that 


I f(u) ~ f(v) _ 

I g{u) - g{v) 


1 m 


In this formula, we let v .—7 b—, use (6.19), and sc 
b — S < u < b. This proves (6.20). 


■ - å| < £. 

obtain \f{u)/g(u) -\\<e for 
□ 


(6.16) Remark. With slight modifications of the above proof, one sees that 

- the theorem remains true for b = + 00 ; 

- the theorem remains true for Å = +00 or A = — 00 ; and 

- the theorem remains true for the limit x — > a+. 
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(6.17) Theorem. Under the assumptions of Theorem 6.15, where (6.19) is re- 
placed by 

(6.23) lim f(x) = oo and lim g(x) = oo, 


we also have (6.20). 

Proof. We multiply (6.22) by ^ U ^ = 1 — , which gives 

9{v) g(v) 


(6.24) 


I g(v) 


g(v)i 


_£M| 

g(v)\ 


We wish to isolate \f(v)/g(v) — X\ in the expression on the left. Using the modified 
triangle inequality .4 — \B\ <\A — B\ (or \A\ < \A — B\ + \B\), we obtain 


l/ÉÉ 


_ g(u) i 

g(v) I 


I f(u) - X g(u) i 

g(v ) I' 


Now we keep u fixed and let v —> b— . Because of (6.23), the expression on the 
right side approaches s. Therefore, \f(v) / g(v) — X\ < 2e for v sufficiently close 
to b. This proves the statement. □ 


Examples. The quotient of the functions f(x) = sina; and g(x) = x gives, for 
x — » 0, the undetermined expression 0/0. Applying Theorem 6.15, we compute 


(6.25) 


COS X 



Obviously, these equalities have to be read from right to left. Since lirn x ^o cos x = 
1 exists, lim x ^o sin x jx also exists and equals 1. 

Next, we consider f(x) = e ax ( a > 0) and g(x) = x n , which both tend to 
oo for x —> oo. Repeated application of Theorem 6.17 (and Remark 6.16) yields 


n(n - l)x n ~ 


This shows that the exponential function e ax increases faster (for x — > oo) than 
any polynomial. 

For a > 0 we obtain from Theorem 6.17 and Remark 6.16 


1/x 


Hence, any polynomial increases faster than a logarithm. 
Undetermined expressions of the form 

0 • oo or 0° or oo° 


can be treated as explained in the following examples: 
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(6.28) lim (x ■ lnæ) = lim = lim = lim(— x) = 0, 

x^o+ v ' x— >o+ l/x x^o+ -l/a; 2 x^o v ' 

(6.29) lim+ x x = lim+ exp(x ln x) = exp ^ lim+ x ln x'j = exp(0) = 1, 

(6.30) lim \fx = lim x 1 ^ = exp^ lim — — ) = exp(0) = 1. 

In the last two examples, we have exploited the continuity of the exponential func- 
tion. 

Derivatives oflnfinite Series 

Where is it proved that one obtains the derivative of an infinite series by 
taking the derivative of each term? 

(Abel, Janv. 16, 1826, Oeuvres, vol. 2, p. 258) 

The term-by-term differentiation of infinite series is justified by the following the- 
orem. 

(6.18) Theorem. Let f n : ( a , b) ->Rbea sequence ofcontinuously differentiable 
functions. If 

i) Jirn f n {x) = f(x) on (a, b), and 

ii) lim f' n (x) = p(x) , where the convergence is uniform on ( a , b), 

then f(x) is continuously differentiable on ( a , b), and for all x G ( a , b) we have 

(6.31) Jhm, f' n ( x) = f( x). 


Proof As we can guess, the essential “ingredient” of this proof (in addition to the 
Fundamental Theorem of Differential Calculus) is Theorem 5. 19 on the exchange 
of limits and integrals. 

We fix xo e (a, b ). Because {/',(>'£')} converges uniformly on (a, b), we ob- 

tain 

J p(t ) dt = Jhrn^ J f' n {t) dt = Jim (f n {x) - f n (xo)) = f(x) - f(x 0 ). 

By Theorem 6.9, this shows thatp(x) = f(x) and that (6.31) holds. The conti- 
nuity of fix) follows from Theorem 4.2. □ 

(6.19) Counterexamples. The functions (see Fig. 6.4) 

(6.32) f n (x) = | + X n 2 x 2 and f n (x) = ^sin(nx) 


show that hypothesis (i) (even with uniform convergence) is not sufficient to prove 

(6.31). 
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FIGURE6.4. Uniform convergence with lim /' / (lim/ n )' 


Exercises 


6.1 Let a positive integer n be given and define f n : K — > R by 


fn{x) = 


x n sin(l/a; 3 ) 
0 


if x ± 0, 
if X = 0 . 


How often is f n differentiable and which derivatives of f n are continuous? 

6.2 Show by two different methods (using (6. 1 ) as well as Carathéodory ’s formu- 
lation (6.6)) that if g{x) is differentiable at xq with g(x o) ^ 0, then 1 / g{x) 
is also differentiable at xq. 

6.3 Show that the following function is increasing on [0,1]: 


/(*) = {* 


(2 — cos(lnx) — sin(lnx)) 
0 


0 < x < 1 
x = 0, 


but that there are infinitely many points with /'(£) = 0. Is this a contradiction 
to Eq. (6.14)? Is f(x) differentiable at the origin? 

6.4 a) Let h : [a, 6] — > M be continuous on [a, b ] and n times differentiable on 
( a , b). Show that if h {x) has n + 1 zeros in [a, 6], then there exists £ G ( a , b) 
with /i (”)(£) = 0. 

Hint. Apply Rolle’s Theorem repeatedly. 

b) Set h(x) = f(x) - p(x), where p(x) is the interpolation polynomial on 
equidistant gridpoints (see Eq. (II.2.6)), and conclude that for an n times dif- 
ferentiable function f(x) (see Eq. (II.2.7), 

(6.33) ^> = / 60(£). 


6.5 The function of Fig. 6.5, often called “the devil’s staircase”, shows that La- 
grange’s Theorem (see Corollary 6.12a) is not as trivial as it might appear. 
If x has a representation in base 3 as, e.g., x = 0.20220002101220... , then 
f(x) is obtained in base 2 by converting all 2’s preceding the first 1 into l’s 
and deleting all subsequent digits, in our example f(x) = 0.101100011. In 
particular. 
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f(x) = \ i f x e [§, |] , /(ar) = f if x € [§, f], f(x) = | if ar e [|, |] - 

Show that this function is continuous and nondecreasing. It is differentiable with 
derivative f'(x) = 0 on a set of measure 1/3 + 2/9 + 4/27 + 8/81 + . . . = 1, 
hence, as we say, almost everywhere. Nevertheless, /(O) ^ /(!)• 



FIGURE6.5. The devil’s staircase 


Compute by L’Hospital’s Rule (and using logarithms) lim ^1 — — j . 
(Approximate rectification of the arc of a circle). Let a circle of radius 1 be 
given. For a point M on the circle let N be the point on the tangent at O such 
that NO = arc MO. Compute the position of P on the orthogonal diameter 
OC colinear with N and M (see Fig. 6.6). What is the limiting position of P 
if a tends to zero ? 



FIGURE6.6. Approximate rectification of the arc of a circle 

Remark. The answer is 3. Therefore, if P is placed exactly at the point x = 3, 
then NO is an excellent approximation for arc MO. 

6.8 Consider the sequence 

/„(*) = /L//* n=l,2,3,4,... . 

Show that f n {x) converges uniformly on [—1, 1] to a function f(x). Is f(x) 
differentiable? For which values of x is lim^oo f' n (x) = f'(x) ? 
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III.7 Power Series and Taylor Series 

After a scientific meeting at which Cauchy presented his theory on the con- 
vergence of series Laplace hastened home and remained there in reclusion 
until he had examined the series in his Mécanique céleste. Luckily every 
one was found to be convergent. (M. Kline 1972, p. 972) 

Let co , ci , C2 , C3 , . . . be a sequence of real coefficients and let x be the independent 
variable. Then, we call 

(7.1) ^2 c n x n = co + c\x + C2X 1 + C3X 3 + ... 

n = 0 

a power series. In this section, we investigate the set of x-values for which the 
series (7.1) converges. We also study properties (continuity, derivative, primitive) 
of the function represented by (7.1). 

(7.1) Lemma. Suppose that the series (7.1) converges for a certain x. Then, it also 
converges for all x with x < |x|. 

Moreover, for each q with 0 < 77 < \x\, the series (7.1) converges absolutely 
and uniformly on the interval [—77, ry] . 

Proof. The convergence of the series ^ c n x n implies that the sequence {c n x n } 
is bounded (see Eq. (2.3) and Theorem 1.3), i.e., there exists a B > 0 such that 
\c n x n \ < B for all n > 0. Therefore, for x < ?y, we have 

|c„x"| < |c„|ry" = |c„x"| • |7| < Bq n with q = < 1. 

By Theorem 2.5, this implies the convergence and the absolute convergence of 
c n x n . The uniform convergence follows from Theorem 4.3. □ 

(7.2) Definition. We set 

(7.2) q = sup {|x| ; X^o c nX n converges } 

and call g the radius of convergence of the series (7.1). We set g = 00 if (7.1) 
converges for all real x. 

(7.3) Theorem. The series (7.1) converges for all x satisfying \x\ < g, it diverges 
for all x satisfying |x| > g, and we have uniform convergence on [—ry, ry] if 0 < 
V < Q- 

Proof. Let x be a value with x < g. Then, there is an x with x < |æ| < g such 
that (7.1) converges for x (put e = (g — x | ) / 2 in Definition 1.11). Thus, from 
Lemma 7.1, we have convergence for x. The uniform convergence on [ — ry, ry] is 
seen in the same way. □ 

This theorem says nothing about the convergence at x = —g and x = g. In 
faet, anything can happen at these points, as we shall see in the following example. 
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(7.4) Example. The series 



is the geometric series (apart from the first term; Example 2.2) for a = 0, reduces 
to — ln(l— x) fora = 1 (seeEq. (1.3.14)), and is, fora = 2, “Euler’s Dilogarithm” 
(Euler 1768, Inst. Calc. Int., Sectio Prima, CaputIV, Exemplum2). Independently 
of a, the radius of convergence of (7.3) is g = 1 (see Example 7.6 below). For 
a = 0 the series diverges at both ends of the convergence interval. For a = 1 
we have divergence for x = +1 (harmonic series), but convergence for x = —1 
(by Leibniz’s criterion). For a = 2 the series converges for x = +1 and also for 
x = — 1 (see Lemma 2.6). 


Determination of the Radius of Convergence 

The following theorems give useful formulas for the computation of the radius of 
convergence. 

(7.5) Theorem (Cauchy 1821). If lim \c n /c n+ \ exists (or is oo), then, we have 

(7.4) q= lim | |. 

Proof. We apply the Ratio Test 2.10 to the series n a n with a n = c n x n . Since 

lim |^±i| = Km | Cn+ c lX J n +1 1 = M lim |^| = |*|/ i™ |^-|, 

the series (7.1) converges if |x| < lim |c„/c„+i|. For \x\ > lim|c„/c n+ i| it 
diverges. This implies Eq. (7.4). □ 

(7.6) Examples. For the series (7.3), wherec n = 1 /n°, we have \c n /c n+ \ = (1+ 
1 /n)“ — > 1 for n — > oo. Therefore, the radius of convergence is g = 1. Similarly, 
for the binomial series for (1 + x) a (Theorem 1.2.2) we have |c„/c„ + i| = (n + 
l)/\ a — n\ — > 1 and g = 1 . 

The series expansions for e x (see Theorem 1.2.3) for sin x and cos x (see 
Eqs. (1.4.16) and (1.4.17)) have been proved to converge for all real x (Sect. III.2). 
Hence, their radius of convergence is g = oo. An example for a series with o = 0 
is 

1 + x + 2! x 2 + 3! x 3 + 4! x 4 + . . . . 

Here, we have c n = n! and |c„/c„ + i| = l/(n + 1 ) — > 0 . 

The formula of Theorem 7.5 is not directly applicable to the series 

x 3 x 5 x 7 

arctanx = x — 3 ”^^ — + . . . , 


(7.5) 
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because |c„/c„ + i| is alternatively 0 and oo. If we divide by x and replace x 2 
by the new variable z, then the series (7.5) divided by x becomes c n,z n with 
c„ = (— l)"/(2n + 1). For this series we have g-- 1 by Theorem 7.5. Hence, the 
series (7.5) converges for \x 2 \ < 1 (i.e., \x\ < 1) and we have g = 1. 

While Eq. (7.4) requires the existence of the limit, the next result is valid 
without restriction (see also Exercise 7.1 below). 

(7.7) Theorem (Hadamard 1892). The radius of convergence of the series (7.1) is 
given by 


limsup ty\c n \ 

Proof We apply the Root Test 2. 1 1 to the series n a n with a n = c n x n . Since 
limsup = \x\ ■ limsup \/|c n |, 

we see that the series (7.1) converges if |x| < 1/ limsup y/\c n \. It diverges if 
|x| > 1 / limsup s/\c n \. □ 


Continuity 

Let D be the domain of convergence 

(7.7) D = {x | series (7.1) converges } 

so that the series (7.1) defines a function / : D — ■> K. given by 

(7.8) f(x) = ^ c n x n for x G D. 

71=0 

It is clear from the uniform convergence on [ — 77 , rj\ for 0 < r] < g (see Theorems 
7.3 and 4.2) that f(x) is a continuous function in the open interval (—g, g). The 
following famous theorem of Abel handles the question of continuity at the end 
points of the convergence interval. 

(7.8) Theorem (Abel 1826). Suppose that the series (7.8) converges for xo = g(or 
for xq = — g ). Then, the function f(x') is continuous at xq = g (or at xq = —g). 

Proof. For simplicity we assume that g = 1 and xq = +1. Otherwise, we stretch 
and/or reverse the convergence interval by replacing xq by ±x$/ g. 

Since, by hypothesis, we have convergence for xo = 1, it follows from 
Lemma 2.3 that for n > N and k > 1, 

(7.9) |c n +i + c „+ 2 + • • • + c„+fc| < e. 

Now, let x be chosen arbitrarily in [0, 1]. Then, for f n (x) = c i x ‘ we have 

(7.10) fn+k(x) - fn(x) = C n+ ix n+1 + c n+2 æ" +2 + . . . + C n+k x n+k . 
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If all c, are > 0, it is clear from Eq. (7.9) that \f n +k{x) — /„. fx) < e. Otherwise, 
we split up (7.10) somewhat more carefully (written here for k = 4): 

C n +lX n+i + C n+ 2X n+4 + c n+3 x n+4 +c„ +4 x n+4 
+Cn+ r ( x n + 3 - X n+A ) +Cn+2 ( X n + 3 - 2:"+ 4 ) +C„ +3 (x”+ 3 - X n + 4 ) 

#C„+1 (x"+ 2 - X n + 3 ) +C n+ 2 (x"+ 2 - x" +3 ) 

+c n+ i(a;” +1 — x” +2 ) 

(this process is called Abel’s partial summation, see Exercise 7.2). In each row, 
we can now factor out a common (positive) factor x n+k , x n+k ~ 4 — x n+k , . . . and 
obtain, by (7.9) and the triangle inequality, 

I fn+k(x) ~ fn(x) | <£-(x n+k +X n+k ~ 1 -X n+k + . . . + X n+1 -x"+ 2 ) < £ 

uniformly on [0, 1]. Therefore, the continuity of f(x) at Xq = 1 follows from 
Theorem 4.2. □ 


Differentiation and Integration 

Since t/n — > 1 for n — > oo (see Eq. (6.30)), it follows from Theorem 7.7 that the 
(term by term) differentiated and integrated power series have the same radius of 
convergence as the original series. We then have the following result. 

(7.9) Theorem. The function f(x) = Yf^=o c nX n A differentiable for \x\ < o 
(where g is the radius of convergence and g > 0), and we have 

(7.11) f'(x) = '£n Cn : v n ~\ 

It has a primitive on (—g, g), which is given by 

(7.12) f/(t) d t = f>^. 

n = 0 n + 

Proof For 0 < r] < o the convergence of these series is uniform on [—ry, ry] 
(and, of course, also on (—ry, ry)). It then follows from Theorem 6.18 that f(x) 
is differentiable on (—ry, ry) and that its derivative is given by (7.11). Similarly, 
Eq. (7.12) follows from Corollary 5.20. □ 

(7.10) Remark. If the series (7.1) converges, say, at x = o, then the differentiated 
series (7.11)neednot converge there. This is the case, for example, with the series 
(7.3) for a = 2. However, the convergence of (7.1) at x = o implies the conver- 
gence of (7. 12) at x = g (see Exercise 7.3). With the use of Theorem 7.8, we thus 
see that identity (7.12) holds for all x € D. 
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(7.11) Example. The geometric series (Example 2.2) has radius of convergence 
g = 1. Integrating it term by term, we obtain from Theorem 7.9 and the definition 
of ln (Sect. 1.3) that fora; G (—1, 1) 



Moreover, the series in (7. 13)converges fora; = 1 and, by Theorem 7.8, we obtain 
ln2 = 1 — 1/2 + 1/3 — 1/4 + . . . , this time rigorously. 


Taylor Series 

. . . and to estimate the value of the remainder of the series. This prob- 
lem, one of the most important in the theory of series, has not yet been 
solved . . . (Lagrange 1797, p. 42-43, Oeuvres, vol. 9, p. 71) 

In 1797 (second ed. 1813), Lagrange wrote an entire treatise basing analysis on 
the Taylor series expansion of a function (see Eq. (II. 2. 8)) 


(7-14) /( x) = ^ -j^ 

i=0 


which allowed him, as he thought, to banish infinitely small quantities, limits, and 
fluxions (“dégagés de toute considération d’infiniment petits, d’évanouissans, de 
limites ou de fluxions”). This dream, however, only lasted some 25 years. 

Regarding x — a as a new variable, this series is of the form (7.1) and the 
previous results on the convergence of the series can be applied. The first problem 
is that there are infinitely differentiable functions for which the series (7.14) does 
not converge for any x ^ a (see Exercise 7.6 below). But even convergence of the 
series in (7.14) does not necessarily imply the identity in (7. 14), as we shall see in 
the subsequent counterexample. 


(7.12) Counterexample. 

. . . Taylor’s formula, which can 

Cauchy (1823) considered the function 
(7.15) f(x) = | e “* /X 


longer be admitted in general . . . 

(Cauchy 1823, Résumé, p. 1) 


if x ± 0 


if x = 0, 


which is continuous everywhere. This function is so terribly flat at the origin (see 
Fig. 7.1), that /W (0) = 0 for all i. In faet, by the rules of differentiation, we obtain 
(fora; /O) 




and we see that f^(x) is a polynomial in 1 jx multiplied by e 1 ' r . Since for 
all n the functions x~ n e~ x ! x tend to zero as x — > 0 (see the examples after 
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FIGURE7.1. Graph of e ' /x ' 2 and its derivatives 


Theorem 6.17), we have /W (x) — > 0 for x — » 0. The faet that also /W (x)/x — > 0 
for x — » 0 implies that /(* +1 ) (0) = lim/^o f^\h)/h = 0. 

Thus, the Taylor series for the funetion f(x) of (7.15) is 0 + 0 + 0 + . . . and 
obviously converges for all x. But, formula (7. 14) is wrong for 

In order to establish Eq. (7.14) for particular funetions, we have to consider 
partial sums of Taylor series and to estimate their error. A useful formula in this 
context has already been derived at the end of Sect. II. 4. It is summarized in the 
following theorem. 

(7.13) Theorem. Let f(x) be k + 1 times continuously differentiable on [a, x] (or 
on [x, a] ifx < a). Then, we have 


The Binomial Series. 

. . . but the one which gives me most pleasure is a paper ... on the simple 

1 | | m(m — 1) 2 ! 

+ mx+ - x ... 

I dåre say that this is the first rigorous proof of the binomial formula . . . 

(Abel, letter to Holmboe 1826, Oeuvres , vol. 2, p. 261) 

A rigorous proof of the binomial identity 
(7.16) (1 + x) a = 1 4- 


a(a-l) 2 a(a — l)(a — 2) 3 
2! 3! 


for |x| < 1 and arbitrary a was first considered by Abel in 1826. A proof based on 
Taylor series can be found in Weierstrass’s leeture of 1861 (see Weierstrass 1861). 
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If we put f(x) = (1 + x) a and compute its derivatives f'(x) = o(l + x)° _1 , 
f"(x) = a(a — 1)(1 + x)“ -2 , . . . , we observe that the series of (7.16) is simply 
theTaylor series of f(x) = (l+x) a . Its radius of convergence has been computed 
as g = 1 in Example 7.6. In order to prove identity (7.16) for \x\ < 1, we have to 
show that the remainder (see Theorem 7.13) 

(7.17) R k (x) = J a(a — 1) • . . . • (a — fc)(l + t) a ~ k ~ 1 dt 

converges to zero for k —> oo. 

Using Theorem 5.17 and putting £ = Q k x with 0 < 6 k < 1, we obtain 

Rk{x) = ^ X ^ kX ^ a(a - 1) • • • (a - k)( 1 + 6 k x ) a ~ k ~ 1 • x 

The factor ax is a constant; (1 + 0 k x) a ~ 1 lies between (1 + x) a ~ l and 1 and is 
bounded; 0<1 — 6 k <1 + 6 k x for all x satisfying \x\ < 1 implies that the factor 
({i-e k )/{\ + e k x)y is bounded by 1. Since the remaining factor 
(a — l)(a — 2) • • • (o — £;) 

k\ X 

is, for |æ| < 1, the general term of a convergent series, it tends to zero by (2.3). 
Consequently, we have R k (x) 0 for k —> oc and the identity (7.16) is estab- 
lished for | x | < 1. 

Whenever the series (7.16) converges for x = +1 or x = — 1, it represents a 
continuous function and thus equals (1 + x) a at these points also (Theorem 7.8). 
Estimate of the Remainder without Integral Calculus. The attempts of La- 
grange (1797) to evaluate the remainder in Taylor’s formula were crowned by the 
following elegant formulas (“ce théoréme nouveau et remarquable par sa simpli- 
cité et sa généralité . . .”): 

f{x) = f(a ) + (x-a)f'($) 

(7.18) f ( x ) = /(°) + (x-a)f{a} + ^ ^ /"(O 

f(x) = f(a) + (x—a)f'(a) + ^ ^ f"(a) + ^ g ^ /'"(?)• 
etc., where ^ is an unknown value between a and x. 

(7.14) Theorem (Lagrange 1797). Let f(x) be continuous on [a, x\ and k + 1 
times differentiable on ( a , x). Then, there exists £; G ( a , x) such that 

m = E / (,| («) + (x ( ~ + °i7 
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Proof. We follow an elegant idea of Cauchy (1823), denote the remainder by 

(7.19) R k (x) = f(x) - £ /«(a), 

*= o *' 


and compare it to the function S k {x) = (x — a) k+1 /(k + 1)!. We have 
R k (a) = 0, R' k (a) = 0, ... , R { k \a) = 0, 

and similarly, S k ^ ( a ) = 0 for i = 0,1, , k. Applying Theorem 6. 14 repeatedly, 
we get 


Rk{x) _ Rk{x) - Rk(a) _ 
S k (x) S k (x)-Sk(a) 


S'kiti) ' 


Kfa) - R' k ( a ) 


R'k&) 

" TO) ' 


TO) - TO) 
TO) - TO) " 


TOi) 


iJ (TO) 


where £1 lies between x and a, £2 between £1 and a , and so on. Since ' (æ) = 
1 and R^k +V> (x) = f^ k+Vl (x), we obtain from (7.20) that 

R k (x) = S k (x) • /( fe+1 )(0 


with £ = ^fc+i. This completes the proof of the theorem. 


□ 


Remark. The relation between the remainders of Theorems 7.13 and 7.14 is given 
by Theorem 5.18. For the original proof of Lagrange see Exercise 7.8 below. 


Exercises 

7.1 Determine the radius of convergence of the series 

f(x) = 1 + 2x + x 2 + 2x 3 + x 4 + 2x 5 + . . . 
and show that Theorem 7.5 is not applicable, but that Theorem 7.7 is. 

7.2 (Partial summation, Abel 1826). Let {a n } and {b n } be two sequences. Prove 
that 

N N 

a n b n = A n (b n — 6 n +i) + Ajvhjv+i — A_ibo, 

where A_i = a is an arbitrary constant and A n = a + ao + a± + . . . + a n . 
Hint. Use the identity 

a n b n = {A n - A n _i)6 n = A n (b n - b n+1 ) - + A n b n+1 . 


jt c n and 


7.3 Consider the series 
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Prove that the convergence of the first series implies that of the second. The 
proof will encounter a difficulty similar to that in the proof of Theorem 7.8, 
which can be settled by a similar idea (partial summation, see preceding ex- 
ercise). 


7.4 Investigate the convergence of the series of Newton-Gregory 
. , , la; 3 1 • 3 x 5 1 • 3 • 5 x 7 

“"'( 1 ) =,+ !i + ni + niT + 


for x = 1 and x = — 1. 

Hint. Wallis’s product will be useful for understanding the asymptotic behav- 
ior of the coefficients. 


7.5 

7.6 


Let D' be the domain of convergence for the 
series in Eq. (7.11). Prove that the identity in 
Eq. (7.11) holds for all x G D' . 

An infinitely differentiable function whose Tay- 
lor series does not converge (see Lerch 1888, 
Pringsheim 1893); show that the series 

. cos 2a; cos 4a; cos 8a: cos 16a; 

f w =—+—+—+^r+- 

and all its derivatives converge uniformly in M. 
Show that its Taylor series at the origin is 
/(0) + /(0)a; + ... 



= (e- 1) - 


4 - 1 


46 -1 
4! X 


64 _ i 
6! X 


2! 

and diverges for all x ^ 0. 

Nevertheless, for the computation of, say, /(0. 01) (correct value /( 0.01) = 
1.71572953) the first two terms of this series are useful. Why? 


7.7 Investigate the convergence of the series (7. 16) for x = 1 and x = — 1. 

7.8 Find formulas (7.18) in the footprints of Lagrange by using, as we would say 
today, a “homotopy” argument. 

Hint. Put 


(7.21) f(x) = f(x - zx) + zxf(x - zx) + ^ - f"(x -zx) + R(z), 

where z is a variable between 0 and 1 and where x is considered as a fixed 
constant. Setting z = 0, we find R( 0) = 0, and with z = 1, we see that 
(x 3 /3!)i?(l) is the error term we are looking for. Now, differentiate (7.21) 
with respect to z and find R'{z) = 3 z 2 f"'{x — zx) . Finally, integrate from 
0 to 1 and apply Theorem 5.18. 

7.9 (Abel 1826). Prove that if the series J2i a *> J2j ty and their Cauchy product 
converge, identity (2.19) holds. 

Hint. Apply Abel’s Theorem 7.8 to the function j'(x) ■ g(x), where f(x) = 
o-iX 1 and g{ x) = bjX j . 
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The theory of the Riemann integral f{x) dx in Sect. III.5 is based on the as- 
sumptions that [a, b] is a finite interval and the function f(x) is bounded on this 
interval. We shall show how these restrictions can be circumvented. If at least one 
of the two assumptions is violated, we speak of an improper integral. 


Bounded Functions on Infinite Intervals 


(8.1) Definition. Let / : [a, oo) —> R be integrable on every interval [a, b] with 
b> a. If the limit 


J f(x) dx := Jim J f(x) d. 


exists, then we say that f(x) is integrable on [a, oo) and that /J° f(x) dx is a 
convergent integral. 


Only wimps do the general case. True teachers tackle examples. 

(Parlett, see Math. Intelligencer, vol. 14, No. 1, p. 35) 


(8.2) Examples. Consider first the exponential function on the interval [0, oo). By 
Definition 8.1, we have 


/ e x dx = lim / e x dx = lim (— e x \ )= lim (: 
J 0 b^ooj 0 6— >oo V lo/ b—>oo y 


Once we are accustomed to this definition, we simply write 

(8.1) J e~ x dx = —e~ x | =1. 


Next, consider the function x a on [1, oo): 

(8 . 2) r^ = r x -. dx= pLr = \ i f^ 

J i z a J l l-a li U«- 1 ) 


if ol < 1 
if ot > 1 . 


For a = 1 a primitive is ln x and the improper integral diverges. 

But how can we check the integrability on [o, oo) if no primitive is known 
explicitly? 


(8.3) Lemma. Let / : [a, oo) ->-Rbe integrable on every interval [a, b\. 

a) If \f(x)\ < g(x) for all x > a and if JJ° g(x)dx is convergent, then 
/ Q °° f(x) dx is also convergent. 

b) If 0 < g(x) < f(x) for all x > a and if g(x) dx is divergent, then 
J a °° f(x) dx also diverges. 
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Proof. Part (a) follows from Cauchy’s criterion (Theorem 3.12), and from Theo- 
rem 5.14, because | J 6 6 f(x) dx | < /j 6 \f(x)\ dx < g(x) dx < e for sufficiently 
large b <b. Part (b) is obvious. □ 

(8.4) Example. For a > 0 we consider the function (1 + x a )~ 2 on the interval 
[0, oo). We split the integral according to 



The first integral is “proper”. For the second integral we use the estimates 



It thus follows from Lemma 8.3 and Eq. (8.2) that the integral (8.3) converges for 
a. > 1 and diverges for a < 1. 



FIGURE 8.1. Graph of sin x /x 


(8.5) Example. Let us investigate the existence of 
(8.4) [ — dx. 

Jo x 

The function f(x) = sin x/x is continuous at x = 0 with /(O) = 1 and so poses 
no difficulty at this point. Using the estimate sin x < 1 would be pointless, since 
the integral x -1 dx diverges. But the graph of f(x) (see Fig. 8.1) shows that 
the integral can be written as an altemating series of the form ao — a± + 02 — 03 + 
. . . , where 

f n sinx , f 2n sinx , sinx , 

ao = / dx, cli = — dx, a 2 = dx, .... 

This series converges by Leibniz’s criterion (Theorem 2.4). The condition a l+ \ < 
cii can be verified with help of the substitution x 1 — ► x — 7 r and a, —> 0 follows 
from the simple estimate 0 < a.j < l/i. 
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Proof. Let g(x) = f([x]) and h(x) = f ( [x\ + 1) be the step functions drawn in 
Fig. 8.2 (here [x] denotes the largest integer not exceeding x). These functions are 
integrable on finite intervals (Theorem 5.11), and, since f(x) is monotonic, we 
have h{x) < f(x) < g(x) for all x. Consequently, 

N N N-l 

/(*)«&< Xj /(Ni 

and the statement follows from Theorem 1.13 since f(x) >0. □ 

As integrals are often easier to calculate than sums, this theorem is very 
useful for discussing the convergence of series. For example, the computation of 
Eq. (8.2) gives an elegant new proof for Lemma 2.6. 

If we try to study what happens “between” the divergent series 1 /ri and 
the convergent series l/n a (for some a > 1), we are led to the investigation of 



(for large n and any a > 1 and 0 > 0 we have n < n(ln n) t6 < n a by Eq. (6.27)). 
With the transformation u = ln x, we have 


and Theorem 8.6, together with Eq. (8.2), proves that the series (8.5) converges 


for 0 > 1, but diverges for 0 < 1. 
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Integrals from —oo to +oo. It would be injudicious to define 

(8.6) J f(x) dx = lim J f(x) dx 


(if the limit exists). This would produce nonsense, for example, by applying the 
transformation formula (II.4.14) with z = x + 1 ( dz = dx). With the above 
definition, we would have 




/:> 


- 1) dx = oo. 


(8.7) Definition. Let f : R — » M be integrable on every bounded interval [a, b\. 
Then, we say that 


J f(x) dx := J f(x) dx + J 


f(x) dx 


exists ifboth improper integrals to the right exist. 


The two integrals 



converge in the sense of Definition 8.7. The first one tends to n (a primitive is 
arctan x). The convergence of the second integral is seen from Lemma 8.3 by 
using e~ x < e~ x for x > 1. 


Unbounded Functions on a Finite Interval 

(8.8) Definition (Gauss 1812, §36). If f : (a, b] — > R is integrable on every 
interval of the form [o + e, b], then we define 

/ f(x) dx := lim / f[x) dx , 

e ^ 0+ Ja+e 

if the limit exists. 


This definition includes situations where |/(x)| — > oo for x — > a. A si- 
milar definition is possible when \f(x)\ — > oc for x — > b. In order to check the 
integrability of such a function, Lemma 8.3 can be adapted without any difficulty. 
(8.9) Examples. For the function x~ a considered on the interval (0, 1] we have 

(8.7) f lim /' lim 

Jo x<X £-^0+ J e x a e~*0f; 1 — a 

The case a = 1 also leads to a divergent integral. Hence, the hyperbola y = 1/x 
(a = 1) is the limiting case with infinite area on the left (0 < x < 1) and on the 
right (x > 1). If a decreases, the left area becomes finite, if a increases, the right 
area becomes finite. 


{ diverges if a > 1 

(1-a)- 1 if a < 1. 
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The integral 


f 1 sinx , f 1 sin 2 ; 1 

Jo X<x Jo x x 


converges if and only if a — 1 < f, i.e., a < 2. This is due to the faet that 
f(x) = sin x jx is continuous at zero with /(O) = 1. 


Euler’s Gamma Function 

Throughout his life, Euler was interested in “interpolating” the factorials 0! = 
1, 1! = 1,2! = 2, 3! = 6, 4! = 24, . . . at noninteger values. He wrote for 
this l-2-3-4-...-a; (“De Differentiatione Functionum Inexplicabilium”, see 
1755, Caput XVI of Inst. Calc. Diff., Opera, vol. X). He finally found the definition 
(totally “explicabilium”) used today in 1781: integration by parts applied to the 
following integral (with u{x) = x n , v'(x) = e~ x ) yields 

(8.8) / x n e~ x dx = — x n e~ x \ +n x n ~ 1 e~ x dx. 

Jo lo Jo 

The term x n e~ x vanishes for x = 0 (n > 0) and for x —> 00 , so we find that 

(8.9) / x n e~ x dx = n\ 

Jo 

Here, we have no problem replacing n by a noninteger real number: 

(8.10) Definition. For a > 0 we define 

(8.10) r(a) := ( x a ~ 1 e~ x dx. 

Jo 


We have to show that the integral of Eq. (8.10) is convergent. There are two 
difficulties: the integrated function is unbounded for x 0 (if a < 1) and the 
integration interval is infinite. We therefore split the integral into 

(8.11) J x a ~ 1 e~ x dx + J x a ~ 1 e~ x dx. 

It follows from the estimate x a ~ 1 e~ x < a; a_1 , from Lemma 8.3, and from 
Eq. (8.7) that the first integral in (8.11) converges for a > 0. For the second 
integral in (8.11) we use the estimate x a ~ 1 e~ x = x a ~ 1 e~ X/f2 ■ e~ x J 2 < Me~ x < 2 
(see the examples after Theorem 6.17) and again Lemma 8.3. 

Equation (8.9) and the computation in Eq. (8.8) show that 

(8.12) r(n + l) = n\, T(a + 1) = af(a) for a > 0. 

With the help of the second relation in (8.12), one can extend the definition of 
r(a) to negative a (a / - 1 . -2, -3, . . . ) by setting 
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(8.13) r(a-l) = ^M 

(see Fig. 8.3). We shall see in Sect. IV.5 that T(l/2) = s/n. 



Exercises 


8.1 Show that the Fresnel integrals (see Fig. II.6.2) 


converge (you can also use a change of coordinates and find an integral sim- 
ilar to (8.4); compare with Fig. 8.1). 

8.2 Show that for the sequence 




lim^oo a n exists and 1 < lim^oo a n <2 (it mightbe helpful to remember 
that f(l/^/x)dx = 2s/x). 

8.3 Show, by using an appropriate change of coordinates, that 
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III. 9 Two Theorems on Continuous Functions 


This section is devoted to two results of Weierstrass. The first proves the existence 
of continuous functions that are nowhere differentiable. The second shows that a 
continuous function / : [a, b] —> R can be approximated arbitrarily closely by 
polynomials. 


Continuous, but Nowhere Differentiable Functions 

Until very recently it was generally believed, that a . . . continuous function 
. . . always has a first derivative whose value can be indefinite or infinite 
only at some isolated points. Even in the work of Gauss, Cauchy, Dirichlet, 
mathematicians who were accustomed to criticize everything in their field 
most severely, there can not be found, as far as I know, any expression of a 
different opinion. (Weierstrass 1872) 

A hundred years ago such a function would have been considered an out- 
rage on common sense. 

(Poincaré 1899, L’oeuvre math. de Weierstrass, Acta Math., vol. 22, p. 5) 



FIGURE9.1. Riemann’s function (9.1) near x = tv 


Before the era of Riemann and Weierstrass, it was generally believed that every 
continuous function was also differentiable, with the possible exception of some 
singular points (see quotations). In 1806, A.-M. Ampere (a name that you have 
surely heard) even published a “proof” of this faet (7. Ecole Polyt., vol. 6, p. 148). 
The first shock was Riemann’s example (5.24), which, when integrated, produces 
a function which is not differentiable on an every where dense set of points. This 
opened the way to the search for functions that were nowhere differentiable. About 
1861 (see Weierstrass 1872), Riemann thought that the function (see Eq. (3.7)) 

(9.1) f(x) = ^2 Sm ^2 ~ = sinæ + ^ sin(4a;) + ^ sin(9a;) + . . . , 
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which is continuous since the series converges uniformly (see Theorems 4.3 and 
4.2), is nowhere differentiable. Weierstrass declared himself unable to prove this 
assertion and, indeed, Gerver (1970) found that (9.1) is differentiable at selected 
points, for example at x = n (see Fig. 9.1). 


(9.1) Theorem (Weierstrass 1872). There exist continuous functions that are 
nowhere differentiable. 

Proof. Weierstrass showed, after two pages of calculation, that 


(9.2) 


f(x) = y b n cos (a n x), 


which is uniformly convergent for b < 1, is nowhere differentiable for ab > 
1 + 37t/2. Many later researchers, intrigued by this phenomenon, found new exam- 
ples, in particular Dini (1878, Chap. 10), von Koch (1906, see Fig. IV.5.6 below), 
Hilbert (1891, see Fig. IV.2.3 below), and Takagi (1903). Takagi’s function was 
reinvented by Tall (1982) and named the “blancmange function”. This function is 
defined as follows: we consider the function 


(9.3) 



0 < x < 1/2 
1/2 < x < 1 


and extend it periodically (i.e., K(x + 1) = K(x) for all x) in order to get a 
continuous zigzag function. Then, we define (see Fig. 9.2) 


(9.4) f(x) = y^K(2 n x) = K(x) + ^K(2x) + ^K(4x) + ^K(8x) + ... . 


Since \K(x)\ < 1/2 and 1 + 1/2+1/4+1/8 + . . . converges, the series (9.4) 
is seen to converge uniformly (Theorem 4.3) and represents a continuous function 
f(x) (Theorem 4.2). 

In order to see that it is nowhere differentiable, we use an elegant argumenta- 
tion of de Rham (1957). Let a point x$ be given. The idea is to choose a n = i/ 2” 
and ø n = (i + 1) /2", where i is the integer with a n < x o < (i n , and to consider 
the quotient 


(9.5) 


f(Pn) ~ /K) 

Pn ~ OL n 


Since at the values a n and (i n the sum in (9.4) is finite, r n is the slope of the 
truncated series J/"=o ^-Kføx) on the interval (a n ,Ø n ) (see Fig. 9.2 where, for 
xo = 1/3, these slopes can be seen to be 0, 1, 0, 1 . . .). 

With increasing n, we always have r n+ 1 = r n ± 1, and the sequence {r n } 
cannot converge. 

On the other hånd, {r„} is a mean of the slopes 

r =x f(Pn)-fjxo) | ^ A J(x 0 ) - f(a n ) 

/3 n ~ X 0 


Xq - 
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where \ n = (/?„ - xo)/(/3 n - a n ) G (0, 1] (if a n = xo we have A n = 1 and the 
second term is not present). Differentiability at xo would therefore imply that 

\r„ - f'{x o) | < X n e + (1 - \ n )e = £ 

for sufficiently large n, which is a contradiction. □ 


Weierstrass ’s Approximation Theorem 

This is the fundamental proposition established hy Weierstrass. 

(Borel 1905, p. 50) 

We have just digested the first Weierstrass surprise, which is the existence of con- 
tinuous functions without a derivative; now comes the second: we can make them 
differentiable as often as we want, even polynomials, if only we allow an arbitrar- 
ily small error e. 

(9.2) Theorem (Weierstrass 1885). Let f : [a, 6] — > M be a continuous function. 
For every e > 0 there exists a polynomial p(x) such that 

(9.6) | p(x) — /( x)\ < e forail x G [a, b]. 

In other terms, f(x) —£ < p(x) < f(x) + e, i.e., the polynomial p(x) is bounded 
between f(x) — e and f(x) + e on the entire interval [a, 6]. 
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The list of mathematicians, compiled from Borel (1905, p. 50) and Meinardus 
(1964, p. 7), who provided proofs for this theorem, shows how much they were 
fascinated by this result: Weierstrass (1885), Picard (1890, p. 259), Lerch 1892, 
Volterra 1897, Lebesgue 1898, Mittag-Leffler 1900, Landau (1908), D. Jackson 
1911, S.Bernstein 1912, P. Montel 1918, Marchand 1927, W. Gontscharov 1934. 
This theorem, which is related to approximation by trigonometric polynomials, 
has also been generalized in various ways (see Meinardus 1964, §2). The follow- 
ing proof is based on the idea of, as we say today, “Dirac sequences”. 

Dirac Sequences. We set, with Landau (1908, see Fig. 9.3a), 


(9.7) 


Vn (x) = 


0 


x 2 ) n if -1 < x < 1 

otherwise, 


where the factor 


(9.8) 

is chosen such that 


1 ■ 3 ■ 5 ■ 7 ■ . . . ■ (2n + 1) 
2 • 2 • 4 • 6 • . . . • 2n 


(9.9) 


<fi n (x) dx = 1 


(see Exercise II.4.3). These functions concentrate, for increasing n, more and 
more of their “mass” at the origin: 
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(9.3) Lemma. Let ip n (x) be given by (9.7). For every e > 0 and for every S > 0 
with 0 < S < 1 there exists an integer N such that for alln> N (see Fig. 9.3b) 

(9.10) 1 — e < J ip n ( x) dx < 1, 

(9.11) J <p n (x)dx + J ip n (x)dx<e. 


Proof. We start with the proof of (9.1 1). Since 1 — x 2 > 1 — x for 0 < x < 1, we 
have J^(l-x 2 ) n dx > fg(l-x) n dx = l/(n + l),andtherefore/u n < |(n + 1). 
Hence, we have for 6 < |x| < 1 

0 < ip n (x) < ip n (5 ) < |(n + 1) • (1 - S 2 ) n . 

Now q := 1 — å 2 < 1 and (1 — S 2 ) n = q n decreases exponentially, so that 
(n + 1) • (1 — 8 2 ) n 0 (see (6.26)). This implies that for n sufficiently large 

0 < (fin (x) < c/2 for 6 < |x| < 1, and Eq. (9.1 1) is a consequence of Theorem 
5.14. The estimate (9.10) is obtained by subtracting (9.1 1) from (9.9). □ 


A Proof of Weierstrass’s Approximation Theorem. We may assume that 0 < 
a < b < 1 (the general case is reduced to this one by a transformation of the form 
x hh a + f3x with suitably chosen constants a and 0 ). We then extend fix) to 
a continuous function on [0, 1], e.g., by putting f(x) = f(a) for 0 < x < a and 
f(x) = f(b) for b < x < 1. Then, we set for £ e [o, 6] 

(9.12) Pn(£) ■= I f(x)ip n (x-Qdx = Hn f f(x) (l - (x - £) 2 )" dx. 

Jo Jo 

If we expand the factor (1 — (x — £) 2 )" by the binomial theorem, we obtain a 
polynomial in £ of degree 2 n, whose coefficients are functions of x. On inserting 
it into (9.12), we see that p n {f) is a polynomial of degree 2 n. 

Motivation. For a fixed £ G [ a,b ] the function Lp n (x — £) will have its peak shifted 
to the point £ (Fig. 9.4). Hence, the product f(x) ■ ip n (x — £) multiplies (more or 
less) the peak by the value /(£). We therefore expect, because of (9.9), that the 
integral (9.12) will be close to /(£). 

Estimation of the Error. For the error between p n {f) and /(£) we shall use the 
triangle inequality as follows: 

\Pn(0~f ( 01 < I / f(x)ip n (x-(;)d X- j f(x)<p n (x-£)dx I 
1 JO J£-S 1 

+ / f(x)<p n (x-£)dx- f(£)<p n (x-£)dx\ 

1 j£-5 J£-6 1 

i r£+ s i 

+ /( 0 / <Pn(x-Z)dx- /(£) . 

1 j£-å 1 


(9.13) 
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We fix some e > 0. Since / is continuous on [0, 1], it is uniformly continuous 
there (Theorem 4.5). Hence, there exists a S > 0 independent of £ such that 


(9.14) \f(x) - /(O | < £ if |æ-£l<<*- 


This S is, if necessary, further reduced to satisfy 6 < a and S < 1 — b. Hence, we 
always have [£ — 5, £ + 5] C [0, 1]. Furthermore, the function f(x) is bounded, 
i.e., satisfies \f(x)\ < M for x € [0, 1] (Theorem 3.6). 

The three terms to the right of Eq. (9.13) can now be estimated as follows: 
for the first one we use boundedness of f(x) and Eq. (9.11) and we see that it 
is bounded by Me\ similarly, the use of Eq. (9.10) shows that the third term is 
bounded by Me\ finally, it follows from (9.14) and (9.9) that the second term is 
bounded by e. We thus have 


|Pn(0-/( 0l<(2M + l)e 


for sufficiently large n. Since this estimate holds uniformly on [o, b ], the theorem 
is proved. □ 
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(9.4) Example. Consider the function / : [1/8, 7/8] — > R defined by 

-3.2x + 0.8 if 1/8 < x < 1/4, 

v/1/64- (x-3/8) 2 if 1/4 <x< 1/2, 

7- v/1/64- {x- 5/8) 2 if 1/2 < æ < 3/4, 

7.6a: - 5.7 if 3/4 < æ < 7/8 . 

As in the above proof, we extend it to a continuous function on [0, 1]. The poly- 
nomials p n (.£) of Eq. (9.12) are plotted in Fig. 9.5 for n = 10, 100, and 1000. We 
can observe uniform convergence on [1/8, 7/ 8] but not on [0,1]. This is due to the 
faet that for £ = 0 or £ = 1 half of the peak of <p n (æ) is cut off in (9.12). The 
hypothesis 0 < a < b < 1 in the above proof can therefore not be omitted. 

The graphs in Fig. 9.5 were actually computed by numerically evaluating the 
integral in (9.12) for 400 values of £ by a method similar to those described in 
Sect. II. 6. It would be a waste of effort to calculate the 2000 coefficients of the 
polynomial. 



Exercises 

9.1 Show, with the help of Wallis’ product, that the factors fi n in (9.8) behave, 
forn — > oo, asymptotically as \fnjn, and that the estimation in the proof of 
Lemma 9.3 is a little crude. 
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9.2 Show that 
(9.16) 


Vn{x) 




n = 1,2,3,... 


is a Dirac sequence, i.e., satisfies (9.9), (9.10), and (9.11) (we shall see in 
Sect. IV.5 that e~ x dx = y/n ). This was actually the sequence on 
which Weierstrass based his proof. 

9.3 Find the constants c n such that 


(9.17) 



-1 < x < 1, 
otherwise 


is a Dirac sequence (see Exercise 5.6). 

This sequence, with the help of trigonometric formulas like (1.4.4'), leads to 
approximations on [— n, n\ by trigonometric polynomials. 


9.4 Let 


Vn{x) = 


0 


if |æ| < l/(2n), 

otherwise. 


Show that for every continuous function f(x) 


lim / Lp n (x--i')f(x)dx = /(Ø 


for all a < £ < b. 


9.5 Expand (1 — (x — <() 2 ) 3 in powers of £ and show that 
f 1 4 + cos(x 4 + y/x) — sin(3æ) 


721n(a; + l)4 


1 — (x — £) 2 j dx 


is a polynomial in £. 



IV 

Calculus in Several Variables 



Drawing by K. Wanner 

The influence of physics in stimulating the creation of such mathematical 
entities as quaternions, Grassmann’s hypernumbers, and vectors should be 
noted. These creations became part of mathematics. 

(M. Kline 1972, p. 791) 

Functions of several variables have their origin in geometry (e.g., curves 
depending on parameters (Leibniz 1694a)) and in physics. A famous problem 
throughout the 18th century was the calculation of the movement of a vibrating 
string (d’Alembert 1748, Fig. 0.1). The position of a string u(x,t) is actually a 
function of x, the space coordinate, and of t, the time. An important breakthrough 
for the systematic study of several variables, which occured around the middle of 
the 19th century, was the idea of denoting pair s (then n-tuples) 

(xi,X 2 ) = - X (xi,X2, ■ ■ ■ ,x n ) = : x 

by a single letter and of considering them as new mathematical objects. They were 
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called “extensive Grosse” by Grassmann (1844, 1862), “complexes” by Peano 
(1888), and “vectors” by Hamilton (1853). 



FIGURE0.1. Movement of a vibrating string (harpsichord) 


The first section, IV. 1, will introduce norms in n-dimensional spaces, which 
enable us to extend the definitions and theorems on convergence and continuity 
quite easily (Section IV.2). However, differential calculus (Sections IV.3 and IV.4) 
as well as integral calculus (Section IV.5) in several variables will lead to new 
difficulties (interchange of partial derivatives, of integrations, and of integrations 
with derivatives). 
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IV. 1 Topology of n-Dimensional Space 


It may appear remarkable that this idea, which is so simple and consists ba- 
sically in considering a multiple expression of different magnitudes (such 
as the “extensive magnitudes” in the sequel) as a new independent magni- 
tude, should in faet develop into a new science; . . . 

(Grassmann 1862, Ausdehnungslehre, p. 5) 
. . . it is very useful to consider “complex” numbers, or numbers formed 
with several units, . . . (Peano 1888a, Math. Ann., vol. 32, p. 450) 

We denote pairs of real numbers by (xi, x 2 ), n-tuples by (xi . X2, .... x n ), and 
call them vectors. The set of all pairs is 

( 1 . 1 ) M 2 = M x M = {(xi,X2) ; xi,X2 G M} 
and the set of all n-tuples is denoted by 

( 1 . 2 ) R"=lxKx...xl = {(xi,X2, ■ ■ ■ ,x n ) ; x* e R, fc = 1 , . . . ,n}. 

Vectors can be added (componentwise) and multiplied by a real number. With 
these operations, we call R n an n-dimensional real vector space. 


Distances and Norms 

The two-dimensional space R 2 can be imagined as a plane, the components x\ and 
X2 being the cartesian coordinates. The distance between two points x = (xt , X2) 
and y = (yi, t/2) is, by Pythagoras’s Theorem, given by (Fig. 1 . 1 ) 

( 1 . 3 ) d(x, y) = \/ (t/i - xi) 2 + (t/2 - x 2 ) 2 . 

This distance only depends on the difference y—x and is also denoted by \\y— x||2, 
where ||,z||2 = y/ z'f + W% = (21, 22). 



FIGURE 1.1. Distance 


FIGURE 1.2. Distance ir 


In three-dimensional space, the distance between x = (xi,X2,X3) and y = 
(t/i, t/2, t/3) is obtained by applying Pythagoras’s Theorem twice (first to the trian- 
gle DEF and then to ABC, see Fig. 1 . 2 ). In this way, we get d(x, y) = \\y — x||2, 
where ||z || 2 = y/ z( + z$ + z£. 
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For n-dimensional space K" we define, by analogy. 


(1.4) \\zh = y/z* + z* + ... + z*, 

and call it the Euclidean norm of z = (zi,Z2,. ■ ■ , z n ). The distance between 
and y G M ra is then given by d(x, y) — ||y - x|| 2 . 

(1.1) Theorem. The Euclidean norm (1.4) has the following properties: 


(NI) 

|x|| > 0 and ||x|| = 

: 0 44 X = 0 , 

(N2) 

l|Aæ|| = A • ||æ|| for 

A G ffi. 

(N3) 

II* + 1/11 < tt*ll + lll/ll 

(triangle inequality). 


Proof. Property (NI) is trivial. Since Xx = (Xxi , . . . , Xx n ), we have || Aar.'||| = 
(Åxi) 2 + . . . + (Ax„) 2 = |A| 2 • ||x||2, which proves (N2). For the proof of (N3) 
we compute 

I® + y\\l = )>2(x k + y k f = s jTxl + 2 s jr x k y k + ^2y k 

<|| x || 2 + 2|| x || 2 || y || 2 + || y || 2 = (|| x || 2 + || y || 2 ) 2 . □ 

Remark. In the above proof, we have used the estimate 

(1.5) 

which is known as the Cauchy-Schwarz inequality. It is obtained from Y ^ k = i (®fc — 
JUk) 2 > 0 in exactly the same way as (III. 5. 19). With the notation 

(1-6) (x,y) :=^2x k y k , 

fc= i 



for the scalar product of the two vectors x and y, inequality (1.5) can be written 
more briefly as 

(1.50 |<Æ,t/}| < ||x|| 2 • ||y|| 2 . 


In the sequel, we rarely need the explicit formula of Eq. (1.4). We shall usu- 
ally just use the properties (NI) through (N3). 

(1.2) Definition. A mapping || • || : M n — > K, which satisfies (NI ), (N2), and (N3), 
is called a norm on M n . The space M", together with a norm, is called a normed 
vector space. 
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Examples (Jordan 1882, Cours d’ Analyse, vol. I, p. 18, Peano 1890b, footnote on 
p. 186, Fréchet 1906). Besides the Euclidean norm (1.4), we have 

JM|i = ^2 \x k \ £i-norm, 
k = 1 

1 1 x 1 1 oo = k max \xk\ maximum norm, 

/ n \ Vp 

IMIp = [^2 \ x k\ P J V norm, p> 1. 

The verification of properties (NI) and (N2) for all these norms and the verifica- 
tion of (N3) for (1.7) and (1.8) are easy. We will see later (“Holder’s inequality”, 
see (4.42)) that the triangle inequality (N3) also holds for (1.9) for any p > 1. 

(1.3) Theorem. Forany x € IR”, we have 

(1.10) ||*IL< NU 51*11! <»-Nloo- 


(1.7) 

( 1 . 8 ) 
(1.9) 


Proof. We only prove the second inequality (the proof of the others is very 
easy and therefore omitted). Taking the square ||x||f in Eq. (1.7) and multiplying 
out, we obtain the sum of squares xj. (which is 1 1 .x 1 1 2 ) and the mixed products 
\xk\ • \xi\, which all are non-negative. This implies that | x \ \ \ > 1 1 æ 1 1 2 • □ 

Each of these norms can be minorized or majorized (up to a positive fac- 
tor) by each of the others. This shows that the norms ||x||i, 1 1 x 1 1 2 , and | x \ \ ^ are 
equivalent in the sense of the following definition. 

(1.4) Definition. Two norms || • \\ p and || • || g are called equivalent ifthere exist 
positive constants C-\ and C 2 such that 

(1.11) C\ - ||x|| p < ||æ|| g < C 2 ■ ||ar|| p for all isB" 


Convergence of Vector Sequences 

Our next aim is to extend the definitions and results of Sect. III. 1 to infinite se- 
quences of vectors. We consider {xj},> 1, where each x. t is itself a vector, i.e., 


( 1 . 12 ) 


Xi = {xu,x 2 i, ■ • • ,x ni ), 


i = 1,2,3,... . 
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(1.5) Definition. We say that the sequence {x,}j> i, given by (1.12), converges to 
the vector a= (ai, a 2 , . . . , a n ) G M" if 

Ve > 0 3N >1 \/i> N \\xi-a\\ <e. 

As in the one-dimensional case, we then write lim x, : = a. 



This is exactly the same definition as in (III. 1 .4), except that “absolute val- 
ues” are replaced by “norms”. 

(1.6) Remark. In order to be precise, one has to specify the norm used in Defi- 
nition 1.5, e.g., the Euclidean norm. But if || • || p is equivalent to || • || 9 , then we 
have 

(1.13) convergence in || • | p convergence in || • || g . 

Indeed, || - a|| p < e and (1.11) imply that ||x» - a||, < C 2 e. Since e > 0 
is arbitrary in Definition 1.5, we can replace it by s' = C 2 e and we see that 
convergence in || • || p implies convergence in || • || 9 . 

Theorem 1.3 shows that || • ||i, || • H2, and || • ||oo are equivalent, and later 
(Theorem 2.4) we shall see that all norms in R n are equivalent. Therefore, we 
may take any norm in Definition 1.5 and the convergence of { x, } is independent 
of the chosen norm. 

(1.7) Theorem. Fora vector sequence (1.12) we have 

lim Xi = a lim xu = ak for k = 1, 2, . . . , rt, 

i.e., convergence in R n means componentwise convergence. 
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Proof. For the maximum norm (1 .8) we have 

(1.14) \\xi — aHoo < s 4=» \x k i-a k \<£ for k = 1, 2, . . . , n. 

On choosing || • in Definition 1.5, we obtain the statement. □ 

With these preparations, it is easy to transcribe the other definitions and re- 
sults of Sect. III. 1 to the higher dimensional case. For example, we call a sequence 
{£*}i>i of vectors bounded, if there exists a number B > 0 such that |x.,|| < B 
for all i > 1. Again, boundedness is independent of the chosen norm. As in The- 
orem III. 1.3, we see that convergent vector sequences are bounded. 

A sequence {xi}i > i is called a Cauchy sequence if 

(1.15) Ve > 0 3N>\ Wi>N W£>1 \\ Xi - x i+t \\ < e. 

Using the maximum norm in ( 1 . 1 5), this is seen to be equivalent to the faet that, for 
k = 1, . . . , n, the real sequences {x k i}i > i are Cauchy sequences. Consequently, 
we immediately obtain the following ex tension of Theorem III. 1.8. 

( 1 . 8 ) Theorem. A sequence of vectors in M n is convergent, if and only ifit is a 

Cauchy sequence. □ 

The generalization of the Bolzano-Weierstrass theorem is somewhat more 
complicated. 

( 1 . 9 ) Theorem (Bolzano-Weierstrass). Every bounded sequence of vectors in M n 
possesses a convergent subsequence. 

Proof. Let {xi}i > i be our bounded sequence. We first consider the sequence 
{xu}i>\ of first components. It is also a bounded sequence, and by Theorem 
III. 1.17, we can extract a convergent subsequence, say, 

(1.16) £l.l, £l,5, #1,9, £l,22, #1,37, £1,53, £l,238, £l,576, .... 

We then consider the second components. The main idea, however, consists in 
considering them only for the subsequence corresponding to (1.16) and not for 
the whole sequence. This sequence is bounded, and we can again apply Theorem 
III. 1.17 to find a convergent subsequence, say, 

(1-17) £2,1, £2,9, 212,58, £2,576, 

Now, the sequence xi, £9, £58, £576, • • • converges in the first and in the second 
component. For n = 2 the proof is complete. Otherwise, we consider the third 
components corresponding to (1.17), and so on. After the nth extraction of a sub- 
sequence, there are still infinitely many terms left and we have a sequence that 
converges in all components. □ 
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Neighborhoods, Open and Closed Sets 

By “set” we mean the entity M formed by gathering together certain defi- 
nite and distinguishable objects m of our intuition or of our thought. These 
objects are called the “elements” of M. 

(G. Cantor 1 895, Werke, p. 282) 

No one shall expel us from the paradise that Cantor has created for us. 

(Hilbert, Math. Ann., vol. 95, p. 170) 

A new mathematical era began when Dedekind (about 1871) and Cantor (about 
1875) considered sets of points as new mathematical objects. 

For sets A, B in R" we shall use the symbols 

(1.18) AcB if all elements of A also belong to B, 

(1.19) 4nB = {ier ; xg7l and x G B}, 

(1.20) AuB = {x G R" ; x G A or x G B}, 

(1.21) A\B = {x G R" ; x G A but x g B}, 

(1.22) CA= {x 6 1" ; x A}. 

The role of open intervals is played by 

(1.23) B e (a) = {ar G R” ; ||ar — a|| < e}, 
which we call a disc (or ball) of radius e and center a (see Fig. 1.4). 



(1.10) Definition (Hausdorff 1914, Chap. VII, §1; see also p.456). Let a G R” be 
given. A set V C R" is called a neighborhood ofa, ifthere exists ane > 0 such 
that B e (a) C V. 

The discs B s (a) dependon the norm (||-||i, || • || 2, or || • ||oo, . . .); the definition 
of a “neighborhood”, however, is independent of the norm used, provided that the 
norms are equivalent. Each B e (o) corresponding to one norm will always contain 
a B e i (a) for any other norm (Fig. 1.5). 

(1.11) Definition (Weierstrass, Hausdorff 1914, p. 215). A set U C R" is open 
(originally: “ein Gebiet”) ifU is a neighborhood of each ofits points, i.e., 


U open 


x G U 3e > 0 B e (x) C U. 
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FIGURE 1.5. Neighborhoods 


(1.12) Definition (G. Cantor 1884, p. 470; see Ges. Abhandlungen, p. 226). A set 
F C IR" is closed ifeach convergent sequence {xj}i>i with x, G F has its limit 
point in F, i.e., 

F closed o = lim x, and x,; G F imply a G F. 


Examples in R. The so-called “open interval” {a. b) = {x G R ; a < x < h} is an 
open set. Indeed, for every x G ( a , b ) the number e = rtiinfx — a, b — x) is strictly 
positive and we have B e (x) c (a, b). On the other hånd, the sequence {a+l/i} 
(for * > 1) is convergent, its elements lie in (o, b) for sufficiently large i, but its 
limit is not in (o, b). Therefore, the set (a, b) is not closed. 

The set [a, b] = {x e M ; a < x < b} is closed (see Theorem III. 1.6). 
However, neither a nor b have a neighborhood that is entirely in [a. b], Hence, 
[a, b] is not open. 

The interval A = [a. b) is neither open nor closed, because a has no neigh- 
borhood lying in [a, b) and the limit of the convergent sequence {b — l/i} is not 
in [a, b). 

Finally, the set R = (—oo, +oo) is both open and closed, and so is the empty 
set 0. 

(1.13) Lemma. 

a) The set A = {x G ; ||x|| < 1} is open. 

b) The set A = {x G M n ; ||x|| < 1} is closed. 


Proof. a) For a G A we take e = 1 — ||o||, which is positive. With this choice, we 
have B £ (a) c A (see Fig. 1.6), since, with the use of the triangle inequality, we 
have for x G B e (o) that 

INI = \\x - a + a|| < ||x - a|| + ||a|| < e + ||a|| = 1. 

Hence, A is open. 
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p= 1.5 I p = 2. 

FIGURE 1.6. Open sc 


P = 3. 

x € R 2 ; |x | p 



p = 100. 



FIGURE 1.7. Closed sets {xel 2 ; ||æ|| p < l} 


b) Consider a sequence {x,}j>i satisfying x t 6 A (for all i) and converging 
to a. We have to show that a G A. Suppose the contrary, a A (i.e., ||a|| > 1, 
see Fig. 1.7), and take s = ||a|| — 1. For this e there exists an N > 1 such that 
||x» — a|| < c for i> N. Using the triangle inequality (or better yet Exercise 1.1), 
we deduce 


|| || = ||x* — o + a|| > ||a|| — ||x* — a|| > ||a|| — s = 1 

for sufficiently large i. This contradicts the faet that x t G A for all i. Hence, 
A = {x G M" ; ||x|| < 1} is closed. □ 

F urther Example s. The set A = {x G M 2 ; Xi,X2 G Q, ||x|| < 1} is neither 
open nor closed. Indeed, each disc contains irrational points and a limit of rational 
points can be irrational. 



FIGURE 1.8. Cantor set 


The famous Cantor set (1883, see Werke, p. 207, Example 11; Fig. 1.8) is 
given by 

A = [0, 1] \ {(1/3, 2/3) U (1/9, 2/9) U (7/9, 8/9) U . . .} 

<124 > 

It is not open (e.g., x = 1/3 has no neighborhood in A), but is closed (see Remark 
1.16 below). 
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“Sierpiriski’s triangle” (Fig. 1.9) and Sierpinski ’s carpet (Fig. 1.10) (Sierpinski 
1915, 1916) are bidimensional generalizations of Cantor’s set. The drawings in 
Figs. 1.9 and 1.10 are not only charming because of their aesthetic appeal, but 
remind us as well that sets can be rather complicated objects. 



FIGURE 1.9. Sierpinski’s triangle 



FIGURE 1.10. Sierpinski’s carpet 


(1.14) Theorem. We have 

i) F closed ==> C F open , 

ii) U open =>■ C U closed. 

Proofi i) Suppose that CF is not open. Then there exists an a G CF (i.e., a £ F) 
such that for all e > 0 we have B e {a) / CF. Taking e = l/i, we can choose a 
sequence {æ.-j-a ft* satisfying x, G F and ||x, — a|| < l/z. Since F is closed, we 
have a G F, a contradiction. 

ii) Suppose that Ctf is not closed. This means that there exists a sequence 
Xi G Cf7 (i.e., Xi / U) converging to an a / ZU, (i.e., a G U ). Since U is open, 
we have B e (a ) C U for an e > 0. Thus, x, / B e (a) for all i, a contradiction. □ 

(1.15) Theorem (Hausdorff 1914, p. 216). Forafinite number of sets, we have 

i) t/i, t/2, ... ; U rn open => tti fl t/2 fl . . . fl U m is open, 

ii) Fi,F 2 ,...,F to closed =>■ Fi U F 2 U . . . U F m is closed. 

For an arbitrary family of sets (with index set A), we have 

iii) U\ open for all X => (Jag/i t/\ = {x € M” ; 3 A G A, x G U\} is open, 

iv) Fa closed for all X => flAe^-^ A = {x G R" ; VÅ G A, x G Fa} is 
closed. 
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intersection 


Proof. We begin with the proof of (i). Let x G U\ n . . . n U rn so that x 6 Uk for all 
k = 1, ... ,m. Since Uk is open, there exists an > 0 such that B ek (x) c £7*. 
With e = min(ei, . . . , e rn ) , we have found a positive e such that B s (x) c £7i D 

. . . n u m . 

The proof of (iii) is even easier and hence omitted. The equivalences (i) <=> 
(ii) and (iii) (iv) are obtained from the “de Morgan rules” 

C(£7i n U 2 ) = (C(7i) u (Cc/ 2 ) 

C(u 1 uu 2 ) = (Cu 1 )n(Cu 2 ), 

together with Theorem 1.14. □ 

(1.16) Remark. With this theorem, we see that the Cantor set of Eq. (1.24) is 
closed. Indeed, its complement 

C A = (-oo, 0) U (1, oo) U (1/3, 2/3) U (1/9, 2/9) U (7/9, 8/9) U . . . 

is an infinite union of open intervals and thus open by Theorem 1.15. 

(1.17) Remark. The statements (i) and (ii) of Theorem 1.15 are not true in general 
for an infinite number of sets. 

Consider, for example, the family of open sets 

(1.26) Ui = {x e ffi 2 ; ||x|| < 1 + l/i}, 

whose intersection U 2 D U 3 fl U 4 , fl . . . = {x e R 2 ; \\x\\ < l} is not open 
(Fig. 1.11). 
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Similarly, the family of closed sets (Fig. 1.12) 

(1.27) Fi = {x e M 2 ; ||x|| < 1 - l/i} 

has a union F 2 U F 3 U F 4 U . . . = {x G M 2 ; ||x|| < l}, which is not closed. 


Compact Sets 

We have already pointed out and will recognize throughout this book the 
importance of compact sets. AU those concemed with general analysis have 
seen that it is impossible to do without them. 

(Fréchet 1928, Espaces abstraits, p. 66) 

(1.18) Definition (Fréchet 1906). A set K c M” is compact iffor each sequence 
{%i}i> i with elements in K there exists a subsequence that converges to some 
element a G K. 

(1.19) Theorem. For K C R” we have 


K compact K bounded and closed. 


Proof. Let K be bounded (i.e., ||x|| < B for all x G K) and closed. We then take 
a sequence {xi}*> i with elements in K. This sequence is bounded and has, by 
Theorem 1 .9, a convergent subsequence. The limit of this subsequence lies in K, 
because K is closed. Hence, K is compact. 

On the other hånd, let K be a compact set. This implies that K is closed, 
because every subsequence of a convergent sequence converges to the same limit. 
In order to see that K is bounded, we assume the contrary, i.e., the existence of 
a sequence {x*} satisfying x* G K for all i and || — > oo. Obviously, it is 
impossible to extract a convergent subsequence, so that K cannot be compact in 
this case. □ 

(1.20) Remark. Compact sets are, by Definition 1.18, precisely the sets in which 
the Bolzano-Weierstrass theorem can be applied. Since this theorem is the basis 
for all deep results on uniform convergence, uniform continuity, maximum and 
minimum, Fréchet was not exaggerating (see quotation). 

(1.21) Theorem (Heine 1872, Borel 1895). Let K be compact and let {U\}xeA 
be a family of open sets U x with 

(1.28) [J U\ D K (open covering). 

xeA 

Then, there exists afinite number of indices Åi,Å2, ■ ■ ■ , A m such that 


U Xl U U X2 U . . . U U Xm D K. 
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Counterexamples. Before proceeding to the proof of this theorem, we show that 
none of the assumptions may be omitted. 

In the example 

K={x-,\\x\\<l}, Ui = {x ; ||z|| < 1 - l/t}, * = 1,2,..., 

it is not possible to find a finite covering of K . This is due to the faet that K is not 
closed. 

In the situation 

K = R n , Ui = {x; M<*}, i = 1,2,..., 

the set K is not bounded. Again, it is not possible to find a finite covering of K. 
Hence, the boundedness of K is essential. 

In our last example, we consider the compact set K = {x : ||x|| < l}, but 
we consider nonopen sets Ui given by 

(r cos r sin y>) ; 0 < r < 1, ^ ^ ^ }. 

None of the Ui is superfluous in the covering {£/i},> i (Fig. 1.13). 



FTGURE 1.13. Non open covering of K FIGURE 1.14. Heine’s proof 


Proof. Following Heine (1872), we enclose the compact set K in an n-dimensional 
cube I (a square for n = 2;seeFig. 1.14). Suppose that weneed an infinite number 
of U\ to cover K . The idea is to split I into 2 ra small cubes by halving its sides 
(here, I\, I2, 13, h). One of the sets K n Ij (j = 1 , . . . , 2”) requires an infinite 
number of Ux in order to be covered. We assume that this is K C\Ie and denote it by 
K\. Again we split le into 2" small cubes, and so on. We thus obtain a sequence 
of sets 

kdk 1 dk 2 dk 3 d... , 

each of which requires an infinite number of U\ in order to be covered. 
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In each Ki, we choose a Xj G JQ. The sequence {x,} is a Cauchy sequence, 
because the diameter of the Ki tends to zero. Therefore (Theorem 1.8), it con- 
verges and we denote its limit by a. Since K is compact (hence closed), we have 
a G K. By (1.28), there exists a A with a G U\. Since this U\ is open, there exists 
an e > 0 with B e (a) c U\. Using again the faet that the diameter of the K, tends 
to zero, we conclude that for sufficiently large m we have K rn c B s (a) C U\. 
Hence, K m is covered by one single U\. This contradicts the assumption that K 
cannot be covered by a finite number of XJ \ . □ 


Exercises 

1.1 Let || • || be a norm on M". Prove that 

| ||a?|| - \\y\\ | < \\x-y\\. 

Hint. Apply the triangle inequality to ||.x-|| = \\x — y + y\\. 

1.2 Show that 

IMh < ||x||i < s/n- ||ar|| 2 V^eR". 

Show that these estimates are “optimal”, i.e., if 

||æ||i <€■ ||æ|| 2 Væer, 
then c < 1 and C > ^fn . 

1 .3 Mr. C.L. Ever might have the idea of defining the “norm” 

ii^2=(i>i i/2 ) • 

i= 1 

Show that this “norm” does not satisfy the triangle inequality. Study also the 
set B = {x G IR 2 ; ||a;||i/2 < 1} and show that it is not convex. 

1 .4 For each set A in R n define the interiør A° of A by 

A= {x A is neighborhood of x} 
and the closure A of A by 

A = {æ | A meets every neighborhood of x} . 

Show that A is a closed set (in faet the smallest closed set containing A) and 
that A is an open set (the largest open set contained in A). 

1 .5 Show that for two sets A and B in M" 

aajb = aub, / AnB=År\B. 

Find two sets A and B in M for which 
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An b ^ An b II Tb ±°a u b . 

1.6 (Sierpinski’s Triangle 1915). Let a, b, c be three points in R 2 forming an equi- 
lateral triangle. Consider the set 

T = |Ao + nb + vc ; A = £ ^ , n = ^ , v = ^ }, 

where Aj, m, v t are 0 or 1 such that A, + //, + ui = 1 for all i. Determine the 
shape of T. Is it open? Closed? Compact? 

1.7 Show that 

INI = ^(N + I •'••il) + |max{|a;i|, \x 2 \} 

is a norm on R 2 . Determine for this norm the shape of the “unit disc” 

B 1 (0) = {x€M 2 ; ||æ|| < l}. 

1.8 Show that the map N : M 2 — » M defined by 

N(x\,x 2 ) = yj ax\ + 2bxix 2 + cx\ 

is a norm on M 2 if and only if a > 0 and ac — b 2 > 0. 

1 .9 Deduce the Bolzano-Weierstrass theorem from the Heine-Borel theorem. 
Hint. Suppose that {x n } is a sequence with ||x„|| < M, with no accumula- 
tion point. Then, for each a with ||a|| < M there is an e > 0 such that B s (a) 
contains only a finite number of terms of the sequence {x n }. 

1.10 Prove that M" and 0 are the only subsets of M” that are open and closed. 
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IV. 2 Continuous Functions 

. . . according to the judgment of all mathematicians, the difficulty that read- 
ers of this work experience is caused by the more philosophical than mathe- 
matical form of the text .... Now, to remove this difficulty was an essential 
task for me, if I wanted the book to be read and understood not only by 
myself, but also by others. 

(Grassmann 1862, “Professor am Gymnasium zu Stettin”) 
Let A be a subset of R n . A function 

(2.1) / : A - 

maps the vector x = (xi , . . . , x n ) G A to the vector y = (j/i, . . . , y rn ) G R m . 
Each component of y is a function of n independent variables. We thus write 

2/i = fi(xi,---,x n ) 

(2.2) y=f(x ) or : 

2/m = fm(x l,...,X n ). 



Examples. a) One function (to = 1) of two variables (n = 2) can be interpreted 
as a surface in R 3 . For example, the function y = xf + x\ represents a paraboloid 
(Fig. 2.1a). 

b) Two functions (to = 2) of one variable (n = 1) represent a curve in M 3 . 
For example, the spiral of Fig. 2. Ib is given by y\ = cos lOx, y -2 = sin 10æ. If we 
project the curve onto the (t/i, 2 / 2 ) -plane, we obtain a “parametric representation” 
of a curve in M 2 (in our example a circle). 

(2.1) Definition. A function f : A — > R rn , A c R n is continuous at xo G A if 
Ve > 0 3Æ > 0 Vx G A : ||x - x 0 || < S || f(x) - f(x 0 )\\ < e. 
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This corresponds exactly to Definition III. 3. 2 with absolute values replaced 
by norms. Our definition does not depend on the particular norms chosen, as long 
as they are equivalent (by the same argument as in Remark 1.6). If we use the 
maximum norm in M r '\ we find, in analogy to Theorem 1.7, the following result. 

(2.2) Theorem. Afunction f : A — > R'"\ A c M n given by (2.2) is continuous 
at xo G A if and only if the function fj : A — > K is continuous at xo for all 
j = 1, . . . ,m. □ 

As a consequence of this theorem, only the case m = 1 has to be considered 
for the study of continuity. A constant function f(x) = c is obviously everywhere 
continuous. The projection ofx= (x \, . . . , x n ) to the kth coordinate, i.e., p(x) = 
Xk, is also continuous at every point x 0 = (æio, . . . , x n0 ), since \x k - x k0 < 
||x — xo 1 1 (choose 6 = s in Definition 2.1). 

It is almost trivial to generalize the Definition III. 3. 10 of the limit of a func- 
tion and the statements of Theorems III. 3. 3 and III.3.4 to the case of several 
variables as long as the product and the quotient make sense (just replace ab- 
solute values by norms). Consequently, polynomials of several variables, e.g., 
f(x i,X 2 ,X 3 ) = x\x\ — x 1 X 2 ^ 3 + 4 x 2 — 1 , are continuous everywhere, and rational 
functions are continuous at points where the denominator does not vanish. 


JSpfrJP^ 


FIGURE 2.2. Stereogram for discontinuous function f(x 1 , Xi) of Eq. (2.3) (hold the picture 
close to the eyes (20 cm) and stare through the paper to an ohject 20 cm behind it. Then the 
two images will merge and become 3D) 


Example. Consider the function / : R 2 — > M, given by 


(2.3) 


y = f{xi,x 2 ) = 


{ 


xix 2 
xf + x\ 


0 


if xf + x% > 0 
if x\ = x 2 = 0 


(see Fig. 2.2). It is continuous at points satisfying xf + xf > 0. In order to explain 
its behavior close to the origin, we use polar coordinates x\ = r cos ip, x 2 = 
r sin ip so that (for r > 0) 


r 2 cos ip sin <p 1 . „ 
V= - 2 = - stn2<^. 
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Hence, the function is constant on lines going through the origin, with the con- 
stant depending on the angle ip. In each neighborhood of (0, 0), the function (2.3) 
assumes all values between +1/2 and —1/2. Therefore, it cannot be continuous 
at (0, 0). 

The interest of this example is that the partial functions x\ i— > f(x i, 0) and 
X 2 i— > /(0, x 2 ) are continuous also at the origin. Therefore, there is no analog 
of Theorem 2.2 for the independent variables x, as Cauchy (1821, p. 37) actu- 
ally thought. He was corrected, with the above counterexample, by Peano (1884, 
“Annotazione N. 99”). 

Continuous Functions and Compactness 

We continue extending the results of Sect. III.3 to functions of several variables. 
Many of these extensions are straightforward. For example, the analog of Theo- 
rem III. 3. 6 is as follows: 

(2.3) Theorem. Let K c R" be a compact set and let f : K — > R be continuous 
on K. Then, f is bounded on K and admits a maximum and a minimum, i.e., there 
exists u G K and U G K such that 

f(u) < f(x) < f(U) for all x G K. □ 


This theorem leads to the following result, which we already announced in 
Remark 1.6. 

(2.4) Theorem. All norms in R" are equivalent. This means that if N : R" — > R 
is a mapping satisfying the conditions (NI ) through (N3) of Theorem 1.1, i.e., 
(NI) N{x) > 0 and N(x) = 0 & x = 0, 

(N2) N(Xx) = |A| N(x) for XgR, 

(N3) N(x + y) < N(x) + N(y) (triangle inequality), 
then there exist numbers C\ > 0 and C 2 > 0 such that 

(2.4) CilMh < N(x) < C 2 \\x\\ 2 forail ieR". 

Proof. We first show that N(x) is continuous. We write x = x\e\ + x 2 e 2 + ■ ■ ■ + 
x n e n , where e\ = (1,0,..., 0), e 2 = (0, 1, 0, ... , 0), and so on. It then follows 
from (N3), (N2), and the Cauchy-Schwarz inequality (1.5) that 

N(x) = N(x iei + . . . + x n e n ) < N(x iei) + . . . + N(x n e n ) 

< |*i | • N(e 1 ) + . . . + \x n \ ■ N(e n ) < ||æ|| 2 • C 2 , 


with C 2 = s/N(e i) 2 + . . . + N(e n ) 2 . This proves the second inequality of (2.4). 
We now see the continuity of N (x) as follows: 
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N(x) - N( x 0 ) = N(x - xq + xq) - N(x 0 ) 

< N( x - æ 0 ) + N(x 0 ) - N(x 0 ) < C 2 \\x - £Co|| 2 , 
and similarly N(x o) - N(x) = ... < C 2 \\xo - x\\ 2 , so that 

(2.6) \N(x)-N(xo)\<C 2 \\x-xo\\ 2 . 

We then consider the function N(x) on the compact set 

= {zeir ; ||*|| 2 = l}. 

By Theorem 2.3, it admits a minimum at some u G K, i.e., 

(2.7) N(z) > N(u) for all z G K. 

Putting Ci = N(u), which is positive by (NI), we have for an arbitrary x G M n 
(x f 0) that x/\\x\\ 2 G K, and hence also 

This proves the first inequality of (2.4). □ 


Uniform. Continuity and Uniform Convergence 

Exactly as in Sect. III.4, we call a function f : A —> R m , A C R n uniformly 
continuous if it is continuous on A and if the S in Definition 2. 1 can be chosen 
independently of xo € A. We have the following extension of Theorem III.4.5. 

(2.5) Theorem (Heine 1872). Let f : K — ► R m be continuous on K and let 
K c M n be a compact set. Then, f is uniformly continuous on K. 

Proof. The two proofs of Theorem III.4.5 can easily be adapted to the case of 
several variables. Let us give, for our pleasure and as an exercise, a third proof 
using Theorem 1.21 of Heine-Borel. 

We know by hypothesis that 

( 2 . 8 ) 

Mx 0 gK Ve> 0 35 >0 VxgK : ||as — as 0 || < <5 ||/(æ) - /(® 0 )|| < e. 

The idea is to consider the discs {Bg(xo)} Xoe K as an open covering of K and to 
extract a finite covering from it. But we will quickly realize that this will not work 
very well. Let’s be more careful. 

We fix an e > 0. Then, we define for every aG K an open set 

U„ = { x : ||æ - a|| < S/2 with S depending on ;e 0 = a defined in (2.8) }. 

They form an open covering of K. Since K is compact, already a finite number 
U ai , . . . , U aN cover the set K. With the corresponding numbers Si, . . . , <5jv, we 
define 

5 = mm{6 1 /2,5 2 /2,...,S N /2}. 
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Now let x G K and y G K be arbitrary points satisfying ||x — y|| < 8. We 
will show that ||/(æ) — /(y)|| < 2e. Since x G K, there exists an index i with 
x G U ai , i.e., \\x — ctf|| < 8i/2. It then follows from ||æ — y\\ < 8 < <5,/2 and the 
triangle inequality that \\y — a t \\ < <5,. From (2.8), we thus have 

])/(*) - /Kil < II fi4 - /tø f + ll/K) - f(y)\\ <e + e = 2 e, 

which proves the statement. □ 

All definitions and results of Sect. III.4 concerning uniform convergence of a 
sequence of functions carry over immediately to the case of several dimensions. 
Therefore, if a sequence of continuous functions fk '■ A — > M'"\ Acl" con- 
verges uniformly on A to a function f(x), this limit function is continuous (a 
straightforward extension of Theorem III.4.2). Here is an interesting example. 



FIGURE2.3. Curve of Peano-Hilbert 


Curve of Peano-Hilbert. 

A continuous curve can fill a portion of space: this is one of the most re- 
markable facts of set theory, whose discovery we owe to G. Peano. 

(Hausdorff 1914, p. 369) 

Cantor (1878) discovered the sensational result that there is a one-to-one corre- 
spondence between the points of an interval and those of a square. But Cantor’s 
mapping was not continuous. Peano (1890) then found, by a skillful manipula- 
tion of the coordinates in base 3, a continuous curve filling a whole square. Soon 
thereafter, Hilbert (1891) discovered such curves by a beautiful “geometrische 
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Anschauung”: he repeatedly divided the squares into four subsquares and labeled 
their centers consecutively by following the direction of the previous curve (see 
Fig. 2.3). 



FIGURE 2.4. Creation of Hilbert’s curve 


Another Construction. Let ip{i) = (x(t), y{t)). 0 < t < 1 be an arbitrary contin- 
uous curve connecting the points A = (0, 0) for t = 0 and B = (1, 0) for t = 1 
(see Fig. 2.4). We then define a new curve by 




r i(r/(4f),x(4f)) 

I \ (®(4f — 1)V J-'H- J/(4f — 1),) 

| 5 (l + ir(4i — 2), 1 + y(4t — 2)) 
( |(2 - y(4t - 3), 1 - æ(4 1 - 3)) 


if 0 < t < | 

if | < t < | 
if | < t < 1. 


This again gives a continuous curve connecting A = (0, 0) for t, = 0 and B = 
(1,0) for t = 1 (see second picture of Fig. 2.4) so that the procedure can be 
repeated (third picture of Fig. 2.4). This leads to a sequence of functions ipo = ip, 
ipi = V2 = i and so on. Whenever we start from another initial curve 
V’(f) with || <p(t) — ||oo < Ff for f e [0, 1], then — ^t)(f)||ao < Ff/2 

(see Fig. 2.4). It follows that 


(2.9) \\Mt)-Mt)\\<K-2~ k , 


and, by putting tp(i) = <p TO (f) and Ff = 1, 

(2.10) \\ip k (t) - <p k+m (t)\\ <2~ k . 


We see from (2.10) that the sequence 'Pk(t) converges uniformly (Cauchy’s crite- 
rion (III.4.4)), and thus has a continuous limit (t) (Theorem III.4.2). Further, 
from (2.9) we see that the limiting function is independent of the initial function 
ipo (f). Hilbert’s curve from Fig. 2.3, when compared with the curves of Fig. 2.4, 
has slight modifications toward the end points of the intervals [F/4 fe , ( i + l)/4 fc ], 
which disappear as k oc. 

It is interesting to note that both coordinates x{t ) and y(t) are new examples 
of continuous functions that are nowhere differentiable (cf., Sect. III.9). 
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Linear Mappings 


Linear mappings are important examples of uniformly continuous functions. Let 
A be a matrix 


( 2 . 11 ) 



We consider the mapping ihi/ = Ax, where 


(2.12) y t = y 'aijXj, i = l,2,...,m 

Bl 

(when working with matrices it is more convenient to write vectors as column 
vectors, so that (2.12) is just the usual product of two matrices). 

(2.6) Theorem (Peano 1888a, p. 454). In the Euclidean norm, we have for all 

ieR" 


(2.13) ||Æe|| 2 <M- 


vEE4 


Proof. Applying the Cauchy-Schwarz inequality (1.5) to the sum in (2.12), 


;(E4)(£4 


and summing up from i = 1 to m, yields the desired statement. □ 

As a consequence of the linearity of Ax, we get 

|| Ax — Axq\\ < M ■ \\x — xo||, 

with M given by Theorem 2.6. This shows that the mapping x i— > Ax is uniformly 
continuous on M” (take 5 = e/M independent of :eo). 

Example. Consider the two-dimensional matrix 


A=^ 0 +1 M = y / 6 + 2^=2.9713. 
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FIGURE2.5. Majorization of a linear function 


In Fig. 2.5, we have plotted the sets {x ; ||ar ||2 < 1} and {y = Ax : ||x|| 2 < 1}. 
We see that the second set lies in a disc of radius M, confirming the estimate 

(2.13) . Moreover, we observe that the value M is not optimal. 

The Matrix-Norm. The smallest number M satisfying the inequality of (2.13) is 
called the norm (or matrix-norm) of A. It is denoted by 

(2.14) P|| 2 :=su P {||Ær|| 2 ; |N| 2 < 1}. 

Obviously, we have || A|| 2 < M with the M of (2.13), and 

(2.15) ||Ac|| 2 < }A\\4x\\ 2 

for all vectors x. The precise computation of ||A|| 2 involves the eigenvalues of 

A T A and gives, for the above example, ||/t|| 2 = ^3 + \/2 + -\/ 5 + 2\/2 = 
2.6855 (see Fig. 2.5 and Exercise 4.9). 

Hausdorjf’s Characterization of Continuous Functions 

We are interested in a new characterization of continuity, more elegant than that 
of Definition 2. 1 . Instead of working with norms, we shall use neighborhoods and 
open sets. 

For a given function / : M ra — > IR m and for sets U C M", V C R m , we 
denote by 

(2.16) f(U) = {f(x) G R m ; x G U} the direct image of U, 

(2.17) f- 1 (V) = {i £ R" ; f(x) G V) the inverse image of V. 

(2.7) Example. We choose a function f :R 2 —> M 2 , mapping (x. y) to (u, v) by 

(2.18) u = x+ u = (x + 2)t/ 3 -^( a; + l)y+|. 
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This function is sketched for —1.1 < x,y < 1.1 in Fig. 2.6. For a subset U (light 
grey animal of the feline species 1 ) we have drawn the set f(U) and for V (dark 
grey animal of the feline species) the set / _1 (V). We observe that the inverse 
image of a connected set is not necessarily connected. This is due to the faet that 
in our example, the function / is not bijective. 



Characterization of Continuity by Neighborhoods. The set of x G M" satis- 
fying ||a; - æ 0 1| < (5 is Bg(x o) (see Eq. (1.23)), the set of x e R" satisfying 
||/(a:) - y 0 \\ < s is f~ 1 (B e (y 0 )). Therefore, if y 0 = f(x 0 ) and A = M", the 
condition of Definition 2.1 can be expressed by 

(2.19) Ve>0 3 £ > 0 B$(xo) C / _1 (S E (y 0 )). 

Since a neighborhood V of yo is characterized by the existence of an e > 0 such 
that B e (yo) c V, we see that (2.19) is equivalentto the following: 

(2.20) for every neighborhood V of yo . f~A (V) is a neighborhood of 2:0. 
This interpretation of continuity at 2:0 is more elegant, and is still valid in more 
general “topological spaces”. 

A characterization in terms of open and closed sets of a function / : M” — > 
R m being everywhere continuous, is given by the following theorem. 

(2.8) Theorem (see Hausdorff 1914, p. 361). For a function f : E" — > the 

following three statements are equivalent: 

i) / is continuous on R”; 

ii) for every open set V C R m , the set / _1 (V) is open in R n ; 

iii) for every closed set F c M m , the set f~ 1 (F) is closed in M". 


Kot ApH0Ji>2ta. 
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Proof. (i) => (ii): let V C be an open set and take xq € / _1 (V), so that 
f(x o) G V. Since V is open, it is a neighborhood of f(x o) and by (2.20) / _1 (L) 
is a neighborhood of xo . This is true for all xq g / _1 (V). Hence, the set / _1 (L) 
is open by Definition 1.11. 

(ii) => (i): assuming (ii), we shall prove that / is continuous at an arbitrary 
point io £ R*. Let e > 0 be given and set yo = f(x o). The set B e (yo) is open, 
so that by assumption (ii), / _1 (B e (yo)) is also open. Definition 1.11 then implies 
the existence of a S > 0 with B§(x o) C f~ 1 (B e {^yo)). But this is simply the 
continuity of / at xo (see (2.19)). 

(ii) <t=> (iii): the equivalence of statements (ii) and (iii) follows from the iden- 
tity / -1 ( CL) = C(/ -1 (L)) and from Theorem 1.14. □ 



FIGURE2.7. Inverse image for the function (2.21) 




FIGURE2.8. Inverse image for the function (2.21) 


(2.9) Example. Let / : M — > M be given by /(O) =0 and 

(2.21) f{x) = sin(l/a; 2 ) for x ± 0. 

This function is discontinuous at x = 0. We shall demonstrate that for discontin- 
uous functions, (ii) and (iii) above are not true in general. 

For example, the set V = (1/3, 2/3) is open and its inverse image * (V) = 
(a; 2 ,a;i)U(a;4,a;3)U. . . is also open (see Fig. 2.7). However, the set F = [1/3, 2/3] 
is closed, but = [x- 2 , Xj ] U [x 4 , .'£3] IJ ... is not closed, because the limit 

of the sequence {x*} does not lie in 
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For the open set V = (—1/2, 1/2) the inverse image / _1 (I7) = (xo, oo) U 
(x‘2 , x\ ) U . . . U {0} is not open because it is not a neighborhood of 0 (see Fig. 2.8). 
On the other hånd, the inverse image of the closed set F = [—1/2, 1/2], which is 
f~ 1 (F) = [æo, oo) U [x2, x\\ U . . . U {0}, is closed. 

(2.10) Example. Our last example illustrates the faet that Theorem 2.8 does not 
have an analog for direct images. We consider the continuous function / : R — > M 
defined by (see Fig. 2.9) 

< 2 - 22 > /<x) = TT? 

The image of the open set U = (3/4, 2) is f(U) = (4/5, 1], which is not open; 
that of the closed set F = [3, oo) is f(F) = (0, 3/5], which is not closed. 



Integrals with Parameters 

Suppose that we have a function of two variables f(x,p) defined for x G [o, b ] 
and p G [c, d]. If we integrate this function with respect to x, 

(2.23) F(p)= j b f(x,p)dx, 

we obtain a function of p. The question is whether we can ensure that F{p) is 
continuous. 

(2.11) Counterexamples. In formula (b) of Exercise III. 5. 9, we replace n 2 by 
1 /p, and then by p : 

(2.24) f(x,p) = {1 ^J /pr P> 0,0<*<1, 

(2.25) = _ p > 0, 0 < x < oo, 
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and, in both cases, f(x,p ) = 0 if p = 0. 

In the first case, p 0 corresponds to n — > oo in Fig. III.5.5.b, hence F[p) = 
fo fi x i P ) d- 3 ' w iU tend to a nonzero constant, whereas F( 0) = 0. We observe that 
f(x,p) is continuous everywhere except at the point x = p = 0. 

In the second case, for p — > 0, the function f(x,p ) represents a hump that 
Hattens out to infinity while preserving the same area. Again, Fip) is not contin- 
uous at p = 0. This time, f(x,p ) is continuous everywhere , but the domain of 
integration is unbounded. 

In the case where f(x,p) is continuous everywhere and the domain of inte- 
gration is a compact interval, we know that f(x,p) is uniformly continuous (The- 
orem 2.5) and it is an easy exercise to prove (see also the proof of Theorem 3.11 
below). 

(2.12) Theorem. If f(x,p) is a continuous function on [a, b ] x [c, d], then 
F{p)= j f(x,p)dx 

is a continuous function on [c. d], □ 


Exercises 

2.1 Show that there are three different values of t for which the Hilbert curve 

is equal to (1/2, 1/2). 

2.2 Prove that the “matrix-norm” (2.14) is a norm on M ra m . 



FIGURE2.10. Peano’s curve 

2.3 a) Fig. 2.10 shows Peano’s original formulas (see Peano 1890) coded and 
plotted. Give an explanation similar to that of Fig. 2.4 for its construction 
(you will need an animal that connects opposite corners of a square). 
b) In the very last sentence of his paper, Peano asserts, without any further 
explanation, that x and y as functions of t have nowhere a derivative (“Ces 
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x et y, fonetions continues de la variable t, manquent toujours de dérivée”). 
Prove this statement. 

Hint. Adapt de Rham’s proof of Theorem III.9.1 by choosing a n = i/ 9”, 
Pn = (i + l)/9 n . For these arguments the Peano curve is in opposite comers 
of a square of side 3 _r \ so that r n = 3 n . 


2.4 Show that if K C R n is compact and if / : K — > R m is continuous, then 
f(K) C R m is compact. 

2.5 The funetion / : R 2 — > R defined by 


f(xi,x 2 ) 



if x\ + X 2 > 0 
if x\ = x 2 = 0 


is discontinuous at (0, 0) (why?). Find an open set U C R and a closed set 
F c R, such that / - 1 (? 7 ) is not open and f~ 1 (F) is not closed. 

2.6 Define a map P : R 2 — > R 2 (which we call a projectiori) by 


P{x i,x 2 ) = (xi,0). 


a) Show that P is continuous. 

b) Find an open set U C R 2 for which P(U) is not open. 

c) Find a closed set F c R 2 for which P(F) is not closed. 

Remark, (b) is very easy, but (c) is less easy. Because of Exercise 2.4, you 
will have to look for an unbounded F. 



FIGURE 2.11. Plot of (cos x - cos y)/(x - y) 


2.7 A naive user of a mathematical computer package (such as “Maple”) wants 
a 3D plot of the funetion 


9{x,y) 


cos x — cos y 


S<x<8, —8<y<8 


and obtains a result like that of Fig. 2.11. Flow must g be defined for x = y in 
order to obtain a continuous funetion? Then verify, for the funetion obtained, 
the conditions of Definition 2. 1 for a continuous funetion of two variables. 
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IV. 3 Differentiable Functions of Several Variables 


We Germans use instead, following Jacobi, the round d for partial deriva- 
tives. (Weierstrass 1874) 

Our next aim is to introduce the notion of differentiability for functions of more 
than one variable. Since a division by the vector x — xo does not make sense, there 
is no direct way of extending Definition III. 6 . 1 . 


Partial Derivatives. If, in considering a function / : U — > R, U C 1 ", we fix 
all variables but one and regard / as a function of this single variable, we can 
apply Definition III. 6 . 1 . Consider, for example, a function y = f(x\ . x-i ) of two 
variables in a neighborhood of (trio, .'£'20 )• We then denote the derivatives by 


( 3 . 1 ) 


lim f(x 10 + h, x 2 q) - f(x 10, x 2 q) 
h^o h 

lim f(x 10, x 2 Q + h)~ f(x 10, x 2 p) 
h^o h 




and call them partial derivatives of / with respect to x± and X2, respectively. 
Other notations are f Xi (.'£10. £20), D t f(xio. .'£'20), ^/(a’io- .'£'20), or the like. 

Geometrically, these partial derivatives can be interpreted as follows: the 
function y = f{x\ . x 2 ) defines a surface in M 3 (with coordinates x±,x 2 , and 
y) whose intersection with the plane X2 = .'£'20 is the curve x\ 1— > f(x i,x 2 o). 
Therefore, the partial derivative df/dx 1 is the slope of this curve, and 

V = f(x io,x 2 o) + (æio, X20) {x x - x 10 ) 

is the tangent to this curve at (æio, #20). Similarly, the tangent to the curve X2 1-* 
f(x 10, x 2 ) is y = f(x io,x 2 o) + df /d x 2 (x 10 ,X2o)(x2 ~ X20), and the plane 
spanned by these two tangents is given by 

( 3 . 2 ) y = f(x 10, X20) + -7^ (x 10 ,X2o) (xi - xi 0 ) + (ari 0 , x 2 o) {x 2 - X2o)- 

The function f(x i,x 2 ) will be called differentiable at {x\o,x 2 o), if the plane ( 3 . 2 ) 
is a “good” approximation to f(x x,x 2 ) in a neighborhood of (xio, .'£'20) and not 
only along the lines x\ = x w and x 2 = .'£20- 


(3.1) Example. The surface defined by y = e Xl X2 is plotted in Fig. 3 . 1 . The 
partial derivatives of this function are 

df -x 2 -x 2 < 9 / „ - x 2 -x 2 

—{x 1 ,X 2 ) = -2x 1 e Xl X2 , — {xi,x 2 ) = -2x 2 e Xl X2 . 

By evaluating these derivatives at (xio, X20) = ( 0 . 8 , 1 . 0 ), weget the tangent plane 
at this point with the help of Eq. ( 3 . 2 ). It is included in Fig. 3 . 1 . 
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Two Dependent Variables. In the case of two functions of two variables 

(3.3) 2/1 = fi(xi,x 2 ), 2/2 = f2{xi,x 2 ), 

we write (3.2) for each of the two functions: 

(3.4) 

Vi = fi(xio,x 2 o) + T^-(x w ,X2o)(xi - æio) + ^-(x W ,X2o)(X2 - X20), 
V2 = f2(xiO,X 2 o) + T^(x W ,X2o)(xi ~ X W ) + ^(x W , X 2 o)(x 2 - X 2 q). 

This formula is conveniently written in vector notation as 

(3.4') y = f(x 0 )+f'(xo){x-xo), 

where f(x o) is now a matrix, the so-called Jacobian (see Jacobi 1841): 


(3.5) 


f'{x o) 


f §xl( Xo ) 


This notation will allow us to carry over most formulas of Sect. III. 6 to the case of 
several variables. 


(3.2) Example. Consider the function f :R 2 —>R 2 defined by 


(3.6) 


/ \/2xi + sin(æi + x 2 ) 0 
\ V 2 x 2 + cos(a;i — x 2 ) ) ' 


This function sends the origin (x \ , X 2 ) = (0,0) to the point ( 2 / 1 , 2 / 2 ) = (0, 1), 
straight lines to curves, and small squares to sets that look like parallelograms 
(see Fig. 3.2). The Jacobian for (3.6) is 


(3.7) f'(x) 


cos(xi + x 2 ) 
■n(æi - x 2 ) 


cos(a;i + x 2 ) \ 

\f2 + sin(a;i — x 2 ) ) ’ 


and Eq. (3.4) becomes, for xq = (0, 0) T , 2/0 = f(x 0 ), 




302 IV. Calculus in Several Variables 


c'v (yi~yio\ _ ( V2 + 1 1 \ f X! - x 10 \ 

(3 ’ 8) 0 V2){x 2 -x 20 J- 

The linear map given by (3.8) is precisely that of Fig. 2.5, and on comparing the 
two pictures, one can see that the nonlinear mapping (3.6) is approximated, in a 
small neighborhood ofx o, by the linear map defined by the Jacobian. We observe 
that for small values of x — xq, the æi-axis (i.e., x% = X 20 = 0) is mapped to 
a multiple of (\/2 + 1, 0) T and the X 2 -axis to a multiple of (1, s/2) T (see the 
arrows in Fig. 3.2). Hence, the columns of the Jacobian matrix are the images of 
the “infinitesimal unit vectors”. 



FIGURE3.2. Graph of the mapping (3.6) 


Differentiability 

. . . that Weierstrass’s direct teaching had the effect of discouraging the 
spontaneity of the students and was only fully understandable by those who 
had already leamed the subject somewhere else. The most important trea- 
tises have been written by foreigners . . . Probably the first is by my friend 
Stol z (Innsbruck): “Vorlesungen iiber allgemeine Arithmetik” .... 

(F. Klein 1926, Entwicklung der Math., p. 291) 

Let us consider a function 

(3.9) / : U -► M m , [/Cl" 

and assume that a;o G U is an interior point of U (U is a neighborhood of xq). 

(3.3) Definition (Stolz 1887, Fréchet 1906). The function (3.9) is differentiable 
at xq if there exists a linear mapping f'(x 0 ) : R n — > IR m and a function 
r :U —> R m , continuous at xq and satisfying r(xo) = 0, such that 


(3.10) 


f(x) = f(x 0 ) + f'(xo)(x-xo)+r(x)\\x-x 0 \\. 
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(3.4) Remark. If a function is differentiable at Xq. then it is continuous at this 
point. Furthermore, all its partial derivatives exist at xo- This follows from the faet 
that for x — xo = hej (where ej = (0, . . . , 0, 1, 0, . . . , 0) T with the jth component 
equal to 1) Eq. (3.10) becomes 


(3.11) 


feM! = fMej + r{xo + 


Since r(x) is continuous at xq, the limit of this expression exists for h —> 0 and is 
equal to 


J^(zo) = f(x 0 )e j whence ^(xo) = fi( x o)ej 
(here, f(x) = (fi(x ), . . . , f m ( x))). Consequently, the linear mapping is unique. 

The analog of Carathéodory’s formulation (Eq. (6.6) in Sect. III.6) is given 
by the following lemma. 

(3.5) Lemma. The function f(x) of (3.9) is differentiable at xq ifand only ifthere 
exists a matrix-valued function <p(x), depending on xq and continuous at xq, such 
that 

(3.12) f(x) = f(x o) + v(x)(x-x 0 ). 

The derivative of f(x) at xq is given by f'(x o) = p(xo). 

Proof. For a given function ip(x) we put 


f(x o) := <p(x 0 ), r(x) := ( <p(x ) - y(x 0 )) ^ , 

\\X xo|| 

and we see that (3.10) holds. Since (x — ato)/||x — xo|| is boundedby 1, it follows 
from the continuity of ip(x) at .x'o that r(x) 0 for x xq. 

On the other hånd, assume that (3.10) holds. We define <p(xo) ■= f'(x o), 
and, for x ^ xo, 

o.«) ^ )5 =rw+«)| ^5- 

(observe that the product of the column vector r(x) with the row vector (x — xo) T 
yields a matrix), and obtain ip(x)(x — x o) = f'(xo)(x — xo) + r(x)\\x — x$ 1 1 . The 
function <p(x) is continuous at xq because, by Theorem 2.6, \\<p(x) — f'{xo)\\ < 
||r(x)||, and ||r(æ)|| — > 0 for x — > xo- □ 

The following result gives a sufficient condition for differentiability, which 
can be checked by considering partial derivatives only. 
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(3.6) Theorem. Consider a function f : U —> M and xq G U ( interior point). If 
all partial derivatives dfjdxj exist in a neighborhood of xo and are continuous 
at xo, then f is differentiable at xq. 

Proof We shall give the proof for the case n = 2. The extension to arbitrary n is 
straightforward. The idea is to write f(x) — f(x o) as 

f(x i,x 2 ) - f{x io, x 2 o) = (f(x i,x 2 ) -f(x 10 ,x 2 )) + (f(x io, x 2 ) - f(x io, x 20 )) 

and to apply Lagrange’s Theorem III. 6.1 1 to each of the differences. This yields 

f(x i,x 2 ) - f(x io,x 2 o) = ^-(£i,x 2 )(xi - xio) + ^-(x 10 ,&)(x 2 - x 20 ). 

Putting ip(xi,x 2 ) = (^■^~(fi,x 2 ), ^-(* 10 ,^ 2 )), we have established (3.12). 
The continuity of ip(x) at Xq follows from the assumptions. □ 

By Definition 3.3, a vector-valued function f(x) = (fj (x). ... . f m (x)) T is 
differentiable at xq if and only if fi(x) is differentiable at xq for al li = 1 , m. 
It thus follows from Theorem 3.6 that functions whose components are polynomi- 
als in xi , . . . , x n , rational functions, or, elementary functions are differentiable at 
points where they are well-defined. 


Counterexamples 

Discontinuous Function Whose Partial Derivatives Exist Eve ry where. Con- 
sider the function / : M 2 — > M, given by 


(3.14) 


f(xi,x 2 ) 


if x\ + x\ > 0 
if x\ = x 2 = 0 


(see Fig. 2.2). The partial derivatives vanish at the origin, because f(x 1 , 0) = 0 
for all x\ and f(0,x 2 ) = 0 for all x 2 . Away from the origin, the existence of the 
partial derivatives is clear. Nevertheless, the function (3.14) is not continuous at 
the origin (see Sect. IV.2). 


Discontinuous Function Whose Directional Derivatives Exist Every where. 

Partial derivatives are special cases of the so-called directional derivatives. Con- 
sider a function / : M 2 — > M and a vector v of length 1 (||'(;|| 2 = 1). Then 
g[t) := / ( 'Xq + tv) represents the curve formed by the intersection of the surface 
y = f(x-i . x 2 ) with the vertical plane {(x, y) \ x = xo +tv,t G M}. Its derivative 
is denoted by 


(3.15) 


§Q W ) := lim /fa + fa) -/(»») 
dv h^o h 


and is called the directional derivative of / (in direction of v ). Partial derivatives 
are obtained for v = (1, 0) T and v = (0, 1) T . 
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Consider the function 


(3.16) 


f(x l,X 2 ) 


if x\ + x\ > 0 
if xi = X 2 = 0. 


For v = (cos 0, sin 0) T we get 

g(t) = f(tv) 


This function is differentiable at t = 0 for any value of 6 (observe that for sin 6 = 
0 we have g(t) = 0 for all i). Hence, all directional derivatives exist. However, on 
the parabolas x% = ax\ the function is constant, narnely f(x\ . ax'l) = a/(l+a 2 ), 
and all values between —1/2 and 1/2 are assumed in each neighborhood of the 
origin (see Fig. 3.3). Thus, it is not continuous there. 



FIGURE3.3. The function (3.16) (stereogram) 


A Geometrical Interpretation of the Gradient 


For a function / : U — > R, i.e., the case mn = 1 and n arbitrary, the matrix f'(x o) 
of (3.5) is a row vector. It is usually denoted by 


Here, the formal vector (Hamilton 1853, art. 620) 


\dxi dx-i dx n J 

is called Nabla “owing to its fancied resemblance to an Assyrian harp” (J. W. Gibbs 
1907, p. 138). Equation (3.10) then becomes 


(3.18) f(x) = f(x 0 ) +grad/(a; 0 ) • (x - x 0 )+r(x)\\x - æ 0 ||, 

and the equation y = f(x o) + grad/(a;o) • (x - x 0 ) of the tangent plane to the 
surface y = f(x) (see (3.2)) appears again. 
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In order to investigate the function /(x) in a neighborhood of xo, we put 

x = xo + tv and neglect the last term in (3. 18). This yields 

(3.19) f(x 0 +tv) = f(x o) + tgrad/(x 0 ) -v + ... . 

Assuming that v is a vector of length 1, we can deduce the following properties: 

• The vector grad f(x 0 ) is orthogonal to the level curve {x ; f(x) = f(x o)}. 
This follows from (3.19) if we let t — > 0, because /(x o + tv) = f(x o) implies 
grad/(x 0 ) • v = 0. 

• The function increases in directions v where grad / (xo ) • v > 0. Because of 
the Cauchy-Schwarz inequality (1.5), v = grad / ( xq ) / 1 1 grad / (xo ) 1 1 is the 
direction in which f(x) increases fastest. The direction of steepest descent is 
the opposite vector v = — gr ad / (xo ) / 1 1 grad / (xo ) 1 1 . 

• If f(x) has a maximum (or minimum) at xo, then we get the necessary condi- 
tion grad/(xo) = 0. 



FIGURE3.4. Level curves and gradients for the function (3.20) 

Fig. 3.4 shows the level curves f(x) = C (with C = i/ 20; i = 1 .... , 30) 
for the function 

(3.20) /(xi, X 2 ) = x\ — 4xiX 2 + 5xj. 

Its gradient grad /(xi , X 2 ) = (2xi - 4x2, — 4xi + 10 x 2 ) is indicated by arrows. 
We observe that the gradient is orthogonal to the level curve and that the length of 
grad/(xo) indicates the steepness of the surface y = f(x). 

The Chain Rule. We consider two functions 

/ 9 

* M m » 


and study the differentiability of the composed function ( g o /)(x) = g(f(x)). 
As in Sect. III. 6, we use Carathéodory’s characterization (here Lemma 3.5). As- 
suming that / is differentiable at xq and <j at yo = /(x 0 ), we have 
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f(x) = /(at 0 ) + ¥>(at)(at - at 0 ), g(y) = g{y 0 ) + ${y)(y ~ t/o)- 

Putting y = f(x),yo = /(ato) and inserting the first equation into the second one, 
we obtain 


(3.21) g(f( at)) = g(f(x 0 )) + ip(f(x))<p(x)(x - at 0 ). 

Since the product t/>(/(at))<^(at) is continuous at ato, the derivative of g o / is this 
expression evaluated at ato, i.e., 

(3.22) (g o /)'(at 0 ) = fif' (t/o) • /'(ato). 


Written in coordinates, the product (3.22) becomes 


(3.23) 


dzi _ ^ dzi dyj 
dx k jz i d V i dx k ’ 


which generalizes Leibniz’s formula (Eq. (II. 1.16)). 



Example. Suppose that the motion of an elastic pendulum is given in polar coor- 
dinates /(t) = ( r{t)Mt)) , see Fig. 3.5. 1 If we want to know the velocity in 
cartesian coordinates 

we have to differentiate at and y with respect to t. Since the Jacobi matrix of (3.24) 
is given by 

1 The curves of this figure are the solutions of differential equations and were calculated 
by numerical methods (see Sect. II.9). 
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(3.25) g'(r, ip) 

we obtain from (3.22) that 


( cos p — rsin</A 
sin p r cos p J 


x = cos p ■ r — r sirup ■ p, y = sin p ■ r + r cos p ■ p 


(the derivative with respect to time t is denoted by a dot). This permits, for exam- 
ple, the computation of the kinetic energy 


m = i(* 2 + y 2 ) = ^( r 2 +rV). 


The Mean Value Theorem 


We wish to generalize the formula f(b ) — f(a) = f'{£){b — a) of Lagrange’s 
Theorem (Sect. III. 6) to several variables. 

The Case m = 1. Consider a function / : R n ^ IR and let two points n£l” 
and b G R" be given. The idea is to connect these points by a straight line 

x = a + (b — a)t, 0 < t < 1 

and to put 

g(t) := f(a + (b-a)t). 

If /( x) is differentiable at all points of the seg- 
ment {a+ (b — a)t ; te (0, 1)}, g(t) is also 
differentiable, and it follows from (3.22) that 



g l (t) = f'(a + (b-a)t)(b-a). 

Since g(Q) = /(o), g( 1) = f(b), Theorem III.6.11 applied to the function g(t) 
gives ^(1) - 5 ( 0 ) = g'(r)( 1 - 0 ), and hence also 

(3.26) /(6)~ /(«) = /' (£)(6 -a), 

where £ is a point on the segment connecting a and 6. Equation (3.26) looks like 
(III.6.14), but here f'(^)(b — a ) is the scalar product of two vectors. 

The General Case. For a function / : R" — > R m we can apply (3.26) to each 
component of f(x). This gives 

/ I {£-({■) ••• \ / bi — ai\ 

V Ub)-uw) ... i 

where all 0 e R n lie on the segment between a and 6. The drawback of this 
formula is that the argument 0 is different in each row. 
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We cannot hope that (3.26) is true for all functions / : M" — > R m . A counter- 
example is fi(x) = cosx, flix) = sin x, a = 0, b = 2-rr. If we are content with 
an inequality, the situation is as follows. 

(3.7) Theorem. Let f : U — > IR m , U C M n be differentiable at all points of the 
“open” segment ( a,b ) := {x = a + (b — a)t ; 0 < t < 1} (these points are 
assumed to be interior points ofU) and suppose that in the norm (2.14) 

||/'(*)||<M for all x S (o, b). 

Then, we have 

(3.28) \\m-f(a)\\<M.\\b-a\\. 

Proof The idea is to consider the function 

(3.29) g(t) :=YffGi fi{a + (b - a)t) = c T f(a+ (b - a)t), 

where the coefficients ci, . . . , c m are arbitrary for the moment. The derivative of 
g(t) is 

9'W + “ a å) = cT f'( a + ( b ~ a )0 (b~a). 

J=i j= i 3 

Application of Theorem HL 6. 1 1 now yields 

(3.30) c T (f(b) - f(a)) = g( 1) - g(0) = g'(r) = c T f(f)(b - a), 

where ^ = a + (b — o)r lies on the segment (a, b). We now cleverly choose c = 
f(b) — f(a) to make the expression to the left in (3.30) as large as possible. Then, 
applying the Cauchy-Schwarz inequality on the right of Eq. (3.30), we obtain with 
(2.15) that 

\\m-f(a)\\ 2 <\\f(b)-f(a)\\- M -\\b-a\\. 

This gives (3.28) after division by ||/(6)-/(o)|| (note that for \\f(b)-f(a)\\ = 0 
statement (3.28) is obvious). □ 


The Implicit Function Theorem 

Implicit equations f(x, y) = C were the central theme of Descartes’s “Géométrie” 
of 1637 (see, for example, Eq. (1.1.18)). Nobody doubted that such equations de- 
fine geometric curves y = y(x), and Leibniz knew how to differentiate such 
functions. However, in the Weierstrass era (see Genocchi-Peano 1884, p. 149- 
151), mathematicians felt a need for a more rigorous proof that guarantees that 
f(x, y) = C is equivalent to y = y(x) in some neighborhood of a point (xq, j/o) 
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satisfying f(xo,yo) = C. We then say that the implicit equation f(x, y) = C can 
be solved for y. 

Consider, for example, the circles x 2 + y 2 = C and fix a point (xq. j/o) 
satisfying x$ + y^ = C. If yo > 0, we obtain y(x) = y/C — x 2 , for yo < 0 we 
have y(x) = — \/ C — x ' 1 , but for yo = 0 it is impossible to find a function y(x) 
that satisfies x 2 + y(x) 2 = C for all x in a neighborhood of xq . 

In the sequel, we put F(x,y) = f(x,y) — C and replace the condition 
f(x,y) = C by F(x,y) = 0. 


(3.8) Implicit Function Theorem. Consider a function F : M 2 — > R and a point 
(xq. yo) G IR 2 , and suppose that the partial derivatives dF/dx and dF/dy exist 
and are continuous in a neighborhood of (xq , yo). If 

dF 

(3.31) F(x o ,yo) = 0 and — (x 0 ,y 0 )^0, 

then there exist neighborhoods U ofx o, V ofyo, and aunique function y :U —> V 
such that y{xo) = yo and 

(3.32) F(x,y(x)) = 0 for all x&U. 

The function y(x) is differentiable in U and satisfies 


(3.33) 


= dF/dx(x,y{x)) 
V X dF/dy(x,y(x)) 


Proof We may assume that dF/dy(xo,yo) > 0 (otherwise we work with —F 
instead of F). By continuity of dF /dy, there exist S > 0 and (3 > 0 such that 

dF 

(3.34) —(x,y)>/3> 0 for |ar — æ 0 | < <5 and \y - yo\ < S. 

dy 

This implies that F(xo,y) is a monotonically increasing function of y, and, since 
F(xo,yo) = 0, we have F(x o, yo - S) < 0 < F(x o, yo + 5). The continuity of F 
implies the existence of <5i > 0 (<5i < S) such that (see Fig. 3.6) 

F(x, y 0 - S) < 0 < F(x, y 0 + 6) for |æ - a; 0 | < <5i- 

We now put U = ( xo — <5i , xo + <5i), V = (yo — S, yo + S) and apply for each 
x G U Bolzano’s Theorem III. 3. 5 to F(x, y), considered as a function of y. This 
implies the existence of a function y : U — > V satisfying (3.32). The uniqueness 
of y(x) in V follows from the monotonicity of F(x, y) as a function of y. 

We still have to prove that y(x) is differentiable at an arbitrary point x\ G U. 
As in the proof of Theorem 3.6, we use the relation 

dF dF 

F(x, y(x)) = F(x i, j/i) + (C, y{x)) (x - x x ) + — (xi ,rj) (y(x) - y i) , 
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FIGURE 3.6. Proof of the Implicit Function Theorem 


where y\ = y(x i), £ is between x and x\, y is between y(x) and y -\ . From (3.32) 
and (3.34), we thus obtain 


(3.35) 


y{x)-y 1 = ip(x)(x-x 1 ), <p{x) = 


dF/dx(£,y(x)) 

dF/dy(x 1 ,r]) 


The function dF/dx is continuous and thus bounded for \x — Xq\ < bi and 
| y — j/o | < b, say by M. This, together with (3.34), implies \p{x)\ < M//3, 
and the continuity of y(x) is a consequence of (3.35). Once the continuity of y(x) 
is proved, ip(x) is seen to be continuous at x\, so that y(x) is differentiable at ati. 
Formula (3.33) is obtained by computing lim x ^ Xl ip(x). □ 

Remark. If the differentiability of the function y(x) is established, Eq. (3.33) is 
obtained by differentiating the identity F(x , y(x)) = 0. This procedure is called 
implicit differentiation and has been used already at the end of Sect. II. 1 . 


Differentiation of Integrals with Respect to Parameters 

We now wish to know whether an integral containing a parameter p (see Eq. (2.23)) 
is a differentiable function of p and if so, whether its derivative can be computed 
by exchanging integration and differentiation, i.e., by integrating df/dp. 

(3.9) Example. The integral 

W 2 e a7r / 2 _ o 

(3.36) / e ax cos xdx = — - — - — 

Jo a + 1 

is hest computed by taking the real part of eS a+ '^ x dx. If we differentiate 
both sides of (3.36) several times with respect to the parameter a, we obtain 


r r/2 

/ x n e° 

Jo 


a 2 + 1 r 

a formula that would be much more difficult to obtain by other means. 
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(3.10) Counterexample. Looking at Fig. III. 5. 5a, we observe that the integral of 
f n (x) behaves like C/n for n — > oo. This suggests the definition 


(3.38) 


x/p _ xp 3 

nx,p> (1 + x 2 /p 2 ) 2 (p 2 +x 2 ) 2 


for p 2 + x 2 > 0 


and /(0, 0) = 0. Then, 


(339) F{P) = 1 f{X ’ p)dX =WTT) 

has the derivative F' (0) = 1/2. On the other hånd, lim p ^o j^( x -P) i s identically 
zero (see Fig. 3.7). 



FIGURE3.7. The function (3.38) (stereogram) 


(3.11) Theorem. Consider a function f : [o, b] x [c, d] — > R and suppose that the 
partial derivative ^(x,p) exists and is continuouson [a, b] x [c, d], Ifthe integral 

(3.40) F(p) := J f(x,p)dx 

exists for all p G [c, d], then F{p) is differentiable in (c, d) with derivative 

(3.41) F'{p 0 ) = J ^-(x,p 0 )dx. 

Proof We consider the difference 

(3.42) F(p)-F(p 0 ) = J (f{x,p) - f(x,p 0 ^j dx. 

To the term on the right, we apply Lagrange’s Theorem III.6.1 1, which gives 
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Here, 77 depends on x and lies between p and po . Since df/dp is continuous on 
the compact set [o, b ] x [c, d], we see as in the proof of Theorem 2.12 that <~p{p) is 
continuous at po and the statement follows from Eq. (III. 6. 6). □ 


Exercises 

3.1 Consider the function / : E 2 - 


R (see Fig. 3.8a), 


f(x i,x 2 ) = 


— if x\ + x\ > 0 
1 if x\ = X 2 = 0. 

Is / continuous? Does it have directional derivatives at the origin? Are the 
partial derivatives df/dx 1 and df/d X 2 continuous? Is / differentiable? 



FIGURE3.8. Stereograms for Exercises 3.1, 3.2, and 3.3 


3.2 The same questions as before for the function /(.'£'i, X 2 ) = \J\x\X 2 (see 
Fig. 3.8c; the Sydney Opera House). 

3.3 Show that / : M 2 — t E defined by 


f(x i,x 2 ) 


X\ X'2 Slll^ 
0 


if x\ + x\ > 0 
if x\ = X 2 = 0 


(see Fig. 3.8b) is everywhere differentiable, but that the partial derivatives are 
not continuous at the origin. This function is a bidimensional analog of the 
function of Fig. III.6. 1 . 
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FIGURE3.9. Bemoulli’s lemniscate and Cassinian ovals 


3.4 For a given constant a define / : M 2 — >Mby 




if X 1 X 2 ^ 0 
if X 1 X 2 = 0. 


Determine the values of the parameter a for which (a) / is continuous and 
(b) for which / is differentiable. 

3.5 Show that for the function / : M" — > M defined by f(x) = x T Ax, where A 
is a constant n x n matrix, the derivative is given by f'(x) = x T (A + A T ) 
(in case of trouble, write explicitly the components of / for n = 2). 

3.6 Let V (x, y ) be a differentiable function and 


W (r, ip) := V (r cos ip, r sin p). 


Apply the chain rule to show that 

(dV_\ 2 /dV\ 2 _/dW\ 2 1 /d W\ 2 

V dx ) + V dy ) V dr ) r 2 \ dp ) 


3.7 We call a differentiable function / : R" ^ E homogeneous ofdegree p, if 

(3.43) f(ax) = a p f(x) for a > 0, x G M n . 

Show that the functions t an (;e 1 / 2 - 2 ), y/2a;i + 2>x\ + 4a;§, and x\ — 5x\x\ + 
x\xz are homogeneous (of which degree?) and show that a homogeneous 
function satisfies Euler’s identity 

x ^ x)+ ••+*"!£(*) 

Hint. Differentiate (3.43) with respect to a. 

3.8 Study the functions y{x) defined by the implicit equation 

(3.44) (x 2 + y 2 ) 2 - 2x 2 + 2 y 2 = C , 


which yields, for C = 0, the famous “lemniscate” of Jac. Bemoulli (1694, 
see Fig. 3.9). Find the locus of points at which dF/dy = 0, i.e., the points 
at which the Implicit Function Theorem does not apply. Also find the locus 
of maximal values of the solutions of (3.44), i.e., points at which y'(x) = 0, 
and show that they lie on a circle. 
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3.9 Same question for the “folium cartesii” 

x 3 +y 3 = 3 xy 


(see Fig. 4.2 below). 

3.10 Compute the points x € M 2 , where the columns of the matrix f'(x) of (3.7) 
are vectors with the same direction (i.e., det fix) = 0). These points are 
marked by “o” and “x”, respectively, in Fig. 3.2. 

Answer. ((k+l+ 3/4)tt, (k-l+ 1/4)tt) and ((fc+./H- 3/4 )tt, {k-l- 3/4)tt) 
for k,l gZ. 

3.11 Which of the following two integrals do you think is easier to evaluate: 



Well, the second one can be differentiated with respect to the parameter a. 
Do this (after justification) and compute the two integrals. 

3.12 Given that 


f n dx _ -k 
J o a - cos x Va 2 - 1 


verify that 


dx _ 5\/67t /” r dx _ l\\Jbix 

J 0 (5 — cosx) 2 288 an J 0 (6 — 4 cos x) 3 1000 


3.13 Show that 

f aæ ) da; = i arctan(a) • log(l + a 2 ) for a > 0. 

Jo x + 1 2 


Hint. Differentiate the integral with respect to a, after justification. 

3.14 Show that 



Hint. Show, with the help of Definition III.8.1, Theorems 3.11 and III.6.18, 
and Exercise II.4.2.h, that 


F(a) 


- r e -^d 

Jo X 


F\a) 


= — [ e ax smxdx=- tt- 

Jo 1 + O? 


if at > 0. Finally, by modifying the proof of Example III. 8. 5, show that F(a) 
is one-sided continuous at a = 0+. 
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IV.4 Higher Derivatives and Taylor Series 


Now it is easy to see that differentials of this kind keep the same value if one 
exchanges the order of differentiation with respect to the several variables. 

(Cauchy 1823, Résumé , p. 76) 

For the moment we consider functions f(x,y ) of two variables. Partial deriva- 
tives, such as df/dx, are again functions of two variables, and we can repeatedly 
compute their partial derivatives as indicated in the following diagram: 


f(x,y) 

d 

m 


d£ 



dx 

d 



d 





d£ 

a 

d 2 f ? 

d 2 f 

dy 


dxdy 

dydx 

a 



g 

dy 



dy 



0 3 f ? 

d 3 f 

dy 2 


dxdy 2 

dy 2 dx 



The question is whether these derivatives depend on the order of differentiation. 


(4.1) Example. Following Euler (173 4, Comm, Acad. Petrop., vol. VII, p. 177), 
we consider the function f(x, y) = \] x 1 + ny' 1 and compute partial derivatives 
(for x 2 + ny 2 > 0): 

fif , \ x d2 f / \ _ ~ nx y 

dx {x,y) ^/ x 2+ny 2 ’ dydx {x,y) (x* + ny 2 ) W 
df ny d 2 f -nxy 

dy {X,V) ~ ^/x 2 + ny 2 ’ dxdy {x ' y) {ifi + ny 2 ) W 

Euler then announces (see also Euler 1755, §226) that in general, 


(4.1) 


a 2 / , , a 2 / , 

Wi t - x ' a)= teåi lx ' y) - 


This, however, is not true without any further assumptions, as can be seen from 
the following counterexample. 


(4.2) Counterexample. H.A. Schwarz (1873) gave a first rather complicated 
counterexample for (4.1) (see Exercise 4.1). An easier counterexample, due to 
Peano (1884, “Annotazione N. 103”), is obtained by considering 
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(4.2) 


f(x,y) = xyg(x,y), 


where g(x, y) is bounded (not necessarily continuous) in a neighborhood of the 
origin. For this function we have 


— (0,t/) = lim = lim yg{x,y). 

The derivative of this expression with respect to y is 


(43) |^(D,0t«lim(lim 9 (l, !) )), 

provided that this limit exists. Similarly, we have 


(4 ' 4) (0 ’ 0) = -o ( S 9<<X ’ y) ) ’ 

We only have to choose a function g(x, y) for which the limits in (4.3) and (4.4) 
are different. This is the case for 

(4.5) g( x, y) = X V if x 2 + y 2 > 0, 

x 2 + y 2 

for which lim .,^0 g fx. y) = — 1 for all y fy 0 and lim y ^o g(x, y) = +1 for all 
x fy 0. Hence, the mixed partial derivatives 


are different for the function defined by (4.2) and (4.5). 


(4.3) Theorem. Consider a function f 
di d£ _af±_ 

’ dy’ dydx 


R for which the partial derivatives 
ighborhood of (xq , yo) with q,J )x being continuous at 
(xq, yo). Then, exists at (xq, yo) and we have 


d 2 f , , d 2 f 


Proof The idea is to consider a small rect- 
angle with sides h and k. The values of 
/ at the vertices are denoted by /oo, / 01 , 
/ 10 , and fu. The partial derivatives are ap- 
proximately given by 


(4.6) 


^ (xo,yo + k) 


fio ~ /oo 
h 

fn - foi 


y 0 +k -& 

k 


./01 


^0 


/oo 


- 4 " 


—fy 

Xn + h 
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(4 7) d 1 2 / ^ %( x Qi Uo + k) — §^( xp , y 0 ) ~ /n — /oi — /io + /oo 

fc h-k 

and similarly, 

( a o\ <9 2 / _ ty ( æ ° + h ' y °) _ t^ 0 ’ 2/°) _ fn ~ fio ~ foi + /oo 

C ' ’ dxdy ~ h ~ k-h 

The expressions to the right of (4.7) and (4.8) are identical (Euler, “. . . huius 
theorematis veritatem exercitati facile perspiciant . . .”) and the statement of the 
theorem seems plausible. 

In order to make the proof rigorous, we should replace the differences in (4.6) 
by Lagrange’s Theorem III. 6. 1 1 . There is, however, a slight difficulty, because the 
intermediate points £ will not be the same for the two differences. To overcome 
this difficulty, we consider the function 


(4.9) g(x) := f(x,y 0 + k) - f(x,y 0 ), 

apply Lagrange’s Theorem in the form g(x o + h) — g(x o) = hg ’{ £), and obtain 

fu ~ fio ~ foi + /oo = (Z’Vo + ty - (£,J/o)), 

where £ lies between xo and + h. Next, we apply Lagrange’s Theorem to 
y), considered this time as a function of y, and obtain 

<4io) + 

(g is between yo and yo + k). 

Because of the continuity of fj y Q x at (xo, yo), it follows from (4.10) that for 
every s > 0, there exists a <5 > 0 such that for h 2 + k 2 < S 2 , 


| fn — fio — foi + /oo _ d 2 f 
dydx 


h ■ k 


Orii,m>) < -• 


For fc — > 0 the differences (fu - f w ) /k and (/ 0 i - foo)/k tend to (ar 0 + h, y 0 ) 
and S£(xo,yo), respectively. Hence, we have, for \h\ < S, 


(^(xo +/.,») - §£(«.»)) - |^(w.»)| < 

1 (df df \ d 2 f 

te hW xe+h ' m) ~ = £pte»' w) 


\h\dy 
This, however, means that 


and the statement of the theorem is established. 
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This theorem applied several times allows us to exchange higher order deriva- 
tives. For example, 

dx dy dx dx dy dx dx dy dx dy dx dx dx dy dy 

Xy'lT' 

It also applies to functions of more than two variables. Indeed, we can always 
exchange two partial derivatives at a time, the other variables being kept constant. 


Taylor Series for Two Variables 


Our next aim is to extend the Taylor series to functions of two variables. The idea 
(Cauchy 1829, p. 244) is to reduce the problem to one variable by connecting the 
points (.x'o . t/o) and (xo+h, yo+k) by a straight line. We thus considerthe function 

(4.11) g(t) := f(x 0 +th,y Q + tk) 


and apply Eq. (III.7.18) (Taylor series for one variable). For this we have to com- 
pute the derivatives of g(t'). If f(x, y) is differentiable sufficiently often, the chain 
rule yields 

(4.12) g'(t) = *^-(x 0 + th, t/o +tk)h+ *X-{x 0 + th, y 0 + tk) k 

dx dy 

and a further differentiation gives 


(4 - 13) ™ = #<■>** + dfk { ' )hk + oå { - )kh + W { ' )kk ’ 

where the omitted argument of the partial derivatives of / is (xo + th, t/o + tk). 
The two central terms in (4.13) are equal by Theorem 4.3 (further differentiation 
causes the appearance of the binomial coefficients). Inserting the above derivatives 
of g{t) into, for example, 

g( i) = g( o) + g'(o) + ^ g"( o) + ^ g'"(ø) 

(with 0 < 6 < 1), yields 


f(x 0 + h,y 0 + k) = f(x 0 ,yo) + |^(æ 0 ,t/o )h + ^{x 0 ,yo)k 


dxdy 
+ 3^-, 


^•^ 8+ 3 dxé~y^ h2k 

+3 éip^ )hk2 + w [i ' Ti)k ^ 
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where £ = xo + Oh and r) = yo + Ok are intermediate points. It is of course also 
possible to use Theorem III.7.13 with the remainder in integral form. 


(4.4) Example. We consider the function f(x, y) = e x y (see also Example 
3.1), whose partial derivatives are 


df _ 2 2 

= —2xe x y , 
ox 

^l(x,y) = ( Ax 2 -2)e~ x ' 
d 2 f d 2 f _ 2 _ 


df -x 2 -v 2 

-(x,y) = -2ye y , 

= (4y 2 — 2)e~ x2 ~ y2 , 


If we neglect the remainder in (4.14) and put xq = 0.9, yo = 1.2, we obtain the 
quadratic approximation 


/( 0.9 + h, 1.2 + k) w e _2 ' 25 (l - 1.8 h - 2Ak + 0.62 h 2 + 4.32 hk + 1.88 k 2 ) . 

Fig. 4.1 compares this approximation to the function f(x, y). The domain of the 
graph is restricted to— l<æ<2, — l<y<2. 



FIGURE4.1. Taylor’s approximation of second order for f(x, y) = e x v 


Taylor Series for n Variables 

We now extend our formulas to functions 

/ : M n -» K m , 

where f(x) = (f\(x ), . . . , f m (x)) T is composed of m real functions ofa; e R". 
We fix xo € R", h eW 1 and apply the results of Sect. III.7 to g(t) := /, (xq + th) . 
This yields, for example. 
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M x o + h ) = M x o) + il ( x o )hj + ^itib hjhk 

.15) livvv \ 9ih) hjhkhe. 

^ X j^ X k,9X(; 3 


We can go even further, and write (formally, without considering convergence) 


fi( X0 +h)= fi( x o) + £ 4 it, É • • • ib 


d q fi(x o) 


— 1 4-1 4-1 dx^dx^ 

= lj2 = l ,7„ = 1 « 


These formulas are rather cumbersome and call for a more compact notation, 
which, in the words of Dieudonné, “does away with hordes of indices”. The lin- 
ear term in (4.15) is just the zth element of the product f(x%)h (Jacobian matrix 
with vector h). In order to simplify the quadratic term, we consider the bilinear 
mapping f"(x) : M" x M” — > R' m , whose zth component, when applied to a pair 
of vectors u and v, is defined by 

(4.16) (/»(«.,«)). := ÉÉ 

j= 1 fc= 1 J 

Hence, the quadratic term in (4.15) is the zth element of the vector f'(xo)(h, h). 
We can continue by interpreting higher derivatives as multilinear mappings. For 
example, f"{x) : R" x M ra x M ra — > R m is defined by 


(4.17) (/>)<«.»,«)), : = ÉÉE 

1=1 k=i e=i 

With this notation, formula (4.15) becomes 

(4.18) f(x o +h) = f(x o) + f(x 0 )h + ^ f"(x 0 )(h, h) + R 3 . 

For the remainder R 3 we may not write R 3 = (l/3\)f"'(xo + 9h)(h, h, h), be- 
cause the intermediate points xq + 0, h in (4.15) might be different for each com- 
ponent. However, we can use the integral representation (Theorem III.7.13) to 
obtain 

(4.19) R a = (1 ~* )2 /'"fep-4- th)(h,h,h)dt. 


(4.5) Remark. For a vector-valued function g(t) = (gi (t). . . . ,g m (t)) T we use 
the notation 

(4.20) J g(t)dt:=(^j g\(t) dt, . . . , J g m (t)dt^j . 

In what follows, we shall use the estimate 
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(4.21) ||^ g(t)dt\\ < \\g(t)\\dt, 

which is obtained by considering Riemann sums and using the triangle inequality 

asfollows:||i: i fl(^ill<Eill^i)ll<Si- 

Estimation of the Remainder. Suppose we want to estimate the remainder R 3 of 
(4.19). In viewof (4.21), we have to estimate the expression \\f"'(x)(h, h, /i)||. For 
the Euclidean norm this can be achieved by repeated application of the Cauchy- 
Schwarz inequality. Denoting the expression of Eq. (4.17) by a t , we have 

«?<(£*&) INI’. 

hj ■= °vk v k, b% <(Y. 4fc) II^H 2 ’ 

djk ■= dijkewe, c% k < ( ^ d? jke ) Hl 2 . 

where d t j k j = g^-dx^exe • Tnserting from the last inequality into the preced- 
ing one, then b 2 j into the first inequality, yields 

4s(£££4«)mi 2 mi 2 imi 2 . 

i = 1 k=i e=i 

Computing and its square root, we obtain 

(4.22) \\f'"(x)(u,v,w)\\ < M(x) || U || (Ml \\w\\, 
where 

<4 - 23) 

(4.6) Lemma. Let f : M n — > R m be three times continuously differentiable; then 
the remainder R 3 in Eq. (4.18) satisfies 

sup M(x 0 + th), 
d! te[o,i] 


where M (x) is given by (4.23). 



IV.4 Higher Derivatives and Taylor Series 323 


Proof. Applying the estimate (4.21) to (4.19) yields 

\m\ < W"( Xo + h - 'Oli dt - 

Because of (4.22), the expression \\f"(xo + th)(h,h,h ) || is at most equal to 
sup t£ [ 0 M(x o + th)\\h\\ :i and the conclusion follows from Eq. (III.5.18). □ 


Maximum and Minimum Problems 

Our next aim is to extend the results of Sect. II.2 concerning necessary and suf- 
ficient conditions for a local maximum (or minimum) to functions 2 = f(x,y) 
of two variables. We have already seen in Sect. IV.3 (geometrical interpretation of 
the gradient) that grad /(æo, yo) = 0, i.e., 

(4.24) ^-(x 0 ,yo) = 0, x 0 ,y 0 ) = 0, 

is a necessary condition for a maximum (or minimum). Points satisfying (4.24) 
are called stationary points of f(x, y). 

In a sufficiently small neighborhood of a stationary point (xo,yo) (i.e., if 
\x — xo\ and \y — yo\ are small), the remainder term in (4.14) may be neglected 
and the condition 

d 2 f d 2 f d 2 f 

(4.25) ^(x 0 ,y 0 )h 2 + 2-^-^-(x 0 ,y 0 )hk+ -^(x 0 ,y 0 )k 2 > 0 

guarantees that f(x o + h, yo + k) > f[x o , yo) (if the function is only twice contin- 
uously differentiable, we take one term fewer in the Taylor series and exploit the 
continuity of the second partial derivatives). Therefore, we have a local minimum, 
if (4.25) holds for all (h, k) ^ (0, 0). If the expression in Eq. (4.25) is negative 
for all (h, k) ^ (0, 0), we have a local maximum. In the case where (4.25) takes 
positive and negative values depending on the choice of ( h , k), the function has a 
saddle point at (xq, yo), i.e., there are directions in which the function increases 
and other directions in which it decreases. 

In order to check whether a quadratic form Ah 2 + 2 Bhk + Ck 2 is positive 
for all (h,k) ^ (0,0), we put A = h/k and consider AX 2 + 2BX + C. This 
polynomial takes only positive values if A > 0 and AC — B 2 >0, and only 
negative values if A < 0 and AC — B 2 > 0. We have thus proved the following 
result, which is from the very first paper published by the young Lagrange. 

(4.7) Theorem (Lagrange 1759). Let f : M 2 — > M be twice continuously differen- 
tiable and suppose that (4.24) is satisfied. 
a) The point (xq, yo) is a local minimum, if, at (xq, yo), 
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b) The point (xo, t/o) is a local maximum, if, at (xo, 

d 2 f 


(4 ' 27) ^<° 

c) In the case where 


, d 2 f d 2 f ( d 2 f \ 2 „ 

and \fady) >0 ' 


(4.28) 

at (xq, t/o), then this point is a saddle point. 


d 2 fd 2 f / a 2 /x 2 
dx 2 dy 2 \dxdy) 


(4.8) Example. The function 

(4.29) f(x, y) = x 3 + y 3 - 3 xy 

creates the famous “folium cartesii” (letter of Descartes to Mersenne, Aug. 23, 
1638). Its level curves are plotted in Fig. 4.2. Computing the partial derivatives 

^ ( x , y) = 3x 2 - 3 y, ^ (x, y) = 3 y 2 - 3x, 

we see that the function (4.29) has two stationary points, namely (0, 0) and (1,1). 
Checking the sufficient conditions of Theorem 4.7 shows that (0, 0) is a saddle 
point and that (1, 1) is a local minimum (see also Fig. 4.2). 



FIGURE4.2. Level curves for the Cartesian Folium (4.29) 


Extension to n Variables. Consider real-valued functions z = f(x i,. ..,x n ) 
with more than two variables. We have seen in Sect. IV.3 that a necessary condition 
for a local extremum (maximum or minimum) at xq € M n is 
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(4.30) grad/(x 0 ) = 0. 

To obtain sufficient conditions, we must study the quadratic term in (4.15). With 
h = (h \, . . . , h n ) T , this term can be written as (h T H(xo)h)/2, where 

/ f^(4 ••• a fJ Xn tø 

(4.31) H(x) = : : 

V dx n dx! ( x ) §xj[($ 

is the so-called Hessian matrix (Hesse 1857, Crelle J. f. Math., vol. 54, p. 251). If 
the assumptions of Theorem 4.3 are satisfied, this matrix is symmetric. 

If, in addition to (4.30), the matrix (4.31) is “positive definite” at xq, i.e., 
h T H (xo )h > 0 for all h ^ 0, then the point xo is a local minimum. A stationary 
point xo is a local maximum if H(x o) is “negative definite”, i.e., h T H(xo)h < 0 
for all h ^ 0. For the verification of positive (negative) definiteness of a matrix of 
dimension > 3 we refer to the standard literature on Linear Algebra, e.g., Halmos 
(1958, p. 141, 153). 

Conditional Minimum (Lag range Multiplier) 

Problem. Find a local maximum (or minimum) of a function f(x, y) subject to a 
constraint g( x, y) = 0. If we denote the level set of g by A = {(ag y) \ g(x, y) = 
0}, this means that we have to find (xo, t/o) € A such that f(x,y) < f(x o, yo) for 
(x,y) e A. 

A direct approach would be to solve the equation g(x, y ) = 0 for y in or- 
der to obtain y = G(x) (see the Implicit Function Theorem 3.8) and to look for 
an extremum of F(x) = f(x, G(x)) . More generally, we could try to find a pa- 
rameterization (x(t), y(t )) of the level curve A and consider the function F(t) = 
f(x(t),y(t)). A necessary condition for an extremum at (xo,yo) = (x(to),y(to)) 
is F'(t 0 ) = 0, i.e., 

(4.32) ^-(xo,yo)x'(to) + ^-(xo,yo)y'(t 0 ) = 0. 

ax oy 

This is an equation for to and sorts out possible candidates for the solution. How- 
ever, this approach is often impracticable, because a suitable parameterization is 
difficult to obtain. 

Lagrange’s Idea (Lagrange 1788, premiere partie, Sect.IV, §1, Oeuvres, vol. 11, 
p. 78). We observe from (4.32) that grad f(xQ. yo) is orthogonal to the tangent 
vector ( x'(to ), y'(to)) of the level curve A. Hence (see Sect. IV.3), at a local ex- 
tremum, the vectors grad/(a;o, yo) and grad g{xo-/yo) have the same direction 
(see Fig. 4.3), and we get the necessary condition 

(4.33) grad /(x 0 , yo ) = Agradg(x 0 ,j/o), 9(xo,Vo) = 0 
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(if grad /(xo, t/o) 7 ^ 0). The parameter Å is called a Lagrange multiplier. Equa- 
tions (4.33) represent three conditions for the three parameters xq, yo, X. With the 
function 

(4.34) C(x, y, A) := f(x, y) - A g(x, y), 
condition (4.33) can be expressed elegantly as 

(4.35) grad£(x 0 ,t/o, A) = 0. 



FIGURE4.3. Conditional maximum for f(x, y) = x + 2y, p = 3 

(4.9) Example. Let positive numbers a, b and p > 1 be given. Compute the maxi- 
mum of 

(4.36) f(x, y) = ax + by 

in the region x > 0 , y > 0 , subject to the constraint g(x. y) = x p + y p I = 0 
(see Fig. 4.3). Using Lagrange’s idea, we consider the function C(x, y, A) = ax + 
by — X(x p +y p — 1), and the necessary condition (4.35) becomes 

(4.37) a — pAxg -1 = 0, b-pXy^ — Q, x p 0 + y p 0 = l. 

The first two relations yield 

/ a \ V(p- t) / 6 \i/(p- L 

(4.38) xo=( ¥ ) , S „=( ¥ ) , 

and by inserting these values into the last relation of (4.37), we obtain 



(4.39) 
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where 

(4.40) 


Equation (4.39) allows us to compute Å. Inserting the result into (4.38), we finally 
obtain the solution 


b q/p 


(a q + b q ) /p ( a q + b q ) n 

which by Fig. 4.3 can be seen to yield the desired maximum. 


Holder’s Inequality (Holder 1889). Let ty andp > 1 be positive numbers. Then 
x = £ = t? 

{¥ + ripf ,p ' V (¥ + v p ) 1/P 
satisfy x p + y p = 1, and it follows from Example 4.9 that 
a£ + 6ty 


{(P + n py/v 

We thus obtain 


= ax + by < axo + byo = 


(a q + b q ) h 


a£ + bi y < (C p + ty p ) 1/p (a q + b q ) 1/q , 
slated by (4.40). By induction o 


where p and q are related by (4.40). By induction on n, this inequality can be 
generalized to 


for positive numbers Xj and t/j. This is the so-called Holder inequality. For p = 
q = 2, it reduces to the Cauchy-Schwarz inequality (1.5). 

With (4.42), we can prove the triangle inequality for the norm ||x|| p of 
Eq. (1.9). Indeed, for two vectors x, y G IR” , we have 

Ik + y\\p = J2\ x i + Vi\ p < it, l^i ‘ \ Xi + y* i p_1 + é l^l • \ Xi + ^r 1 - 

i= 1 i= 1 i= 1 

We apply (4.42) to the two sums on the right side of this inequality and obtain 

i= 1 i= 1 i= 1 

= \\x\\ p -\\x + y\\ p - 1 . 

This yields ||x + y \\ p < (||x|| p + ||y|| p ) • ||x + y and hence the triangle 
inequality ||x + y\\ p < ||æ|| p + ||y|| p . 
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Exercises 


4.1 (H.A. Schwarz 1873). Show that for 

. fx 2 arctan — y 2 arctan - 

f{x,y) = < x y 


ifxyj^O 
if xy = 0, 


d 2 f d 2 f 

the second partial derivatives at the origin are different: - - V - - . 

dxdy dydx 

4.2 Show that Taylor’s formula (4.14) only holds if all partial derivatives involved 
are continuous. This is in contrast to the case of one variable (see, e.g., Theo- 
rem III.6.1 1). The following counterexample by Peano (1884, “Annotazione 
N. 109”), 

2 + yV o 


otherwise, 

b, shows that Eq. (4.14), written with the 


xo = yo = -a, h 

first-order error term 



f{x o + h,y 0 + k) = f(x 0 ,y 0 ) + 


where £ = xo+ Oh and rj = yo+ Ok are intermediate points, might be wrong. 
This corrected an error in Serret’s book. 

4.3 Analyze for Example 4.4 the intersections of the graph of f(x, y) with that 
of its Taylor approximation of order 2 in the neighborhood of ( xo , yo) and 
explain the star-shaped curves (see Fig. 4.1). Why do you think the authors 
chose the point (0.9, 1.2) for their figure and not, as in Fig. 3.1, the point 
( 0 . 8 , 1 . 0 )? 

Hint. Use the error formula in (4.14). 

4.4 Let / : M 2 — > R be a differentiable function that satisfies 


grad/(x) = g(x) ■ x T , 


where g : R 2 — > E. Show that / is constant on the circle {x G IR 2 ; |x|| = 
r } . 

4.5 Show that U = (x 2 + y 2 + z 2 ) -1 / 2 satisfies the differential equation of 
Laplace 


d 2 U d 2 U d 2 U 
dx 2 dy 2 "I" dz 2 


for x 2 +y 2 + z 2 > 0. 


4.6 Find the stationary points of the function 

f(x,y) = (x 2 +y 2 ) 2 -8xy 

and study the level curves f(x, y) = Const in their neighborhood. (Any 
similarity of these curves with curves already seen is intentional). 



IV.4 Higher Derivatives and Taylor Series 329 


4.7 Find the maximum value of fy xyz subject to (x + y + z)/3 = 1 . What 
conclusion can be drawn from this result? (We have already seen in Example 
4.9 that the computation of a conditional maximum is an excellent tool for 
obtaining interesting inequalities.) 

4.8 Find the maxima or minima of x 2 + y 2 + z 2 subject to the conditions 


Remark. If there are two conditions to satisfy, you will have to introduce two 
Lagrange multipliers. 


4.9 Let 


(V2+1 1 \ 

1 0 V2 


be the matrix of the example in Sect. IV.2. Find the maximum of the function 
f(x) = \\Ax\\ 2 subject to ||x ||2 -1 = 0. The result is the value of ||A|| 2 , 
defined in Eq. (2.14). 

4.10 Show that the function / : M 2 — > IR given by 


f(x,y) = (y — x 2 )(y — 2x 2 ) 


has the origin as a stationary point, but not as a local minimum. Neverthe- 
less, on all straight lines through the origin, the function has a local min- 
imum. With this counterexample, Peano (1884, “Annotazioni N. 133-136”) 
corrected another error in Serret’s book. Such irreverent criticism of the work 
of the greatest French mathematicians by a 25-year-old Italian “nobody” did 
not delight everybody (see, e.g., Peano’s Opere, p. 40-46). 
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IV. 5 Multiple Integrals 

We know that the evaluation or even only the reduction of multiple integrals 
generally presents very considerable difficulties . . . 

(Dirichlet 1839, Werke, vol. I, p. 377) 

The Riemann integral for a function of one variable (Sect. III. 5) represents the 
area between the function and the x-axis. We shall extend this concept to func- 
tions / : A — > IR (where .4 C t 2 ) of two variables in such a way that the integral 
represents the volume between the surface z = f(x, y) and the (x, y)-plane. Many 
definitions and results of Sect. III.5 can be extended straightforwardly. However, 
additional technical difficulties occur, because domains in R 2 are often more com- 
plicated than those in R (see Fig. 5.1). The extension to functions of more than two 
variables is then more or less straightforward. 



rectangle nonconvex nonconnected 

FIGURE5.1. Possible domains inR 2 


Double Integrals over a Rectangle 

We begin by considering functions / : I — > R, whose domain I = [a. b] x [c. d] = 
{(x, y) | a < x < b, c < y < d} is a closed and bounded rectangle in R 2 , and we 
assume that the function is bounded, i.e., that 

(5.1) 3M >0 V (x, y) € I \f(x,y)\<M. 

We consider divisions 

(52) D x = {x 0 ,x x n } of [a,b], 

D y = {yo,yi,---,y m } of [c,d\, 

where a = xo < x\ < . . . < x n = b and c = yo < yi < ■ ■ ■ < y m = d, denote 
the small rectangle displayed in Fig. 5.2 by Jjj = [x»_i, Xi] x [yj i,y/. and its 
area by 


(5.3) 


MAi) = ( x i ~ Xi~i){yj - yj~ i). 
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FIGURE 5.2. Division of a rectangle together with /;,■ 


Using the notation 

(5.4) f i:j = inf f(x,y), J% = sup f(x,y), 

(x,y)ehj 

we then define lower and upper sums by 

(5.5) s(D x x D y ) = £ E Mhk S(D X xD y ) = j2 E )■ 

i=l .7=1 i=l j=l 

If we add points to the division D x (or to D y ), then the lower sum does not de- 
crease and the upper sum does not increase (cf. Lemma III. 5.1). Futhermore, a 
lower sum can never be larger than an upper sum (Lemma III. 5. 2). Hence, the 
following definition makes sense. 

(5.1) Definition. Let f : I -> R satisfy (5.1). If 

(5.6) sup s(D x x D y ) = inf S(D X x D y ), 

(D X ,Dy) (D*,Dy) 

then f(x,y) is integrable on I and the value (5.6) is denoted by 

(5.7) J f(x,y)d(x,y) or JJ f(x,y)d(x,y). 

As a consequence of this definition and of the aforementioned properties, we 
have that / : I — * M is integrable, if and only if (see Theorem III.5.4) 

(5.8) Ve > 0 3 (D x ,D y ) S(D X x D y ) - s(D x x D y ) < e. 

The theorem of Du Bois-Reymond (Theorem III.5.8) also has its analog. 
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(5.2) Theorem. Let T>$ be the set of all pairs of divisions (D x , D y ) such that 
maxj(a;j — x»_i) < 5 and maxj (yj — yj-i) < S. A function f : I — * K satisfying 
(5.1) is integrable if and only if 

Ve > 0 3<5>0 V(D x .D y ) e. V s S(D X x D y ) - s(D x x D y ) < e. 

Proof For an e > 0 let (D x . D y ) be given by (5.8). This induces a grid whose 
length (in the interior of [a, b] x [c, d\ ) is L = (n — l)(d — c) + (fh — 1 )(b — a) 
(see Fig. 5.3, left picture). We then take an arbitrary division (D x . D y ) e D§, set 
A = S(D X x D y ) — s(D x x D y ), and put D' x = D x U D x , D' y = D y U D y , 
A' = S(D' X x D y ) — s(D' x x D' y ). We then get, exactly as in Eq. (III.5.10) (see 
Fig. 5.3, right picture), 

A < A! + L ■ å ■ 2M. 

The conclusion is now the same as in the proof of Theorem III.5.8. □ 


FIGURE5.3. Division D x x D y (left), division D' x x D' y , elements T tl of D' x x D' y that 
intersect D x x D y (right) 

Let £i, . . . be such that Xj_i < < x t and ry\ ..... r/ m be such that 

Pj-i < Vj < yj- It then follows from Theorem 5.2 that 

(5 ’ 9) | X] ^ ~ ah-iXv, “ Vj-i) ~ JJf{x,y)d(x,y) | < e, 

provided that max* (a; j — Xj_i) < h an d in ax y - ( y,j — y 3 _ i ) < 6. This is true because 
the sum and the integral in (5.9) both lie between s(D x x D y ) and S(D X x D y ). 

Iterated Integrals. The inner sum in Eq. (5.9), namely X^=i /(&> Vj)(yj ~ 
Vj-i), is a Riemann sum for the function fif,, y). Assuming this function to be 
integrable (in the sense of Definition III.5.3) for all i, we obtain from (5.9) that 

±1 f(ti,y)dy(xi-xi- 1 ) - JJf(x,y)d(x,y ) | < £. 


(5.10) 




IV.5 Multiple Integrals 333 


Here, we are again confronted with a Riemann sum, this time for the function 
x i— > f f(x, y) dy. The estimate (5.10) expresses the faet that the Riemann sums 
converge to ff T f(x. y) d(x, y) if max, (; e, : — Xi- 1) —* 0. Hence, we have (Exer- 
cise 5.1) 

(5.11) J (^J f(x,y)dy S jdx = JJ f(x,y)d(x,y) 

and have proved the following result. 

(5.3) Theorem (Stolz 1886, p. 93). Let f : I — > M be integrable and assume that 
for each x G [a,b\ the function y i— > f(x,y) is integrable on [c. d]. Then, the 
function x *—> f c f(x, y) dy is integrable on [o, b] and identity (5.11) holds. □ 

Consequently, the computation of a double integral is reduced to the compu- 
tation of two simple (iterated) integrals and the techniques developed in Sects. II.4, 
II. 5, and III. 5 can be applied. By symmetry, we also have 

(5.12) ^ (^j f(x,y)dx S jdy = JJ f(x,y)d(x,y), 

provided that / : I — > IR is integrable and that the function x i— > f(x, y) is 
integrable on [a, 6] for each y G [c, d], The two identities (5.1 1 ) and (5. 1 2) together 
show that the iterated integrals are independent of the order of integration (under 
the stated assumptions). 

Counterexamples. We shall show that the existence of one of the integrals in 
(5.11) does not necessarily imply the existence of the other. 



FIGURE 5.4a. Nonintegrable function FIGURE 5.4b. Integrable function 


1) Let / : [0, 1] x [0, 1] — > E be defined by (Fig. 5.4a) 

(5 13) f(x, y) = { 1 if = with inte S ers n ’ ^ 

\ 0 else. 




334 IV. Calculus in Several Variables 


For a fixed x G [0, 1] there are only a finite number of points with f(x, y ) ^ 0. 
Hence, fg f(x, y)dy = 0 and the iterated integral to the left of (5.1 1) exists. 
However, every rectangle Xi] x [yj~i,yj\ contains points with f(x, y) = 1 
and points with f(x, y) = 0. Consequently, s(D x x D y ) = 0 and S(D X x D y ) = 1 
for all divisions and the integral to the right of (5. 1 1) does not exist. 

2) The function (Fig. 5.4b) 

{ 1 if (x = 0 or x = 1) and y G Q 

1 if (y = 0 or y = 1) and a: G Q 

0 else 

is integrable, because the points with f(x,y)^ 0 form a set that can be neglected 
(see below). But, for x = 0 or x = 1, the function y i— > f(x, y) is the Dirichlet 
function of Example III.5.6, which is not integrable. 

Nuli Sets and Discontinuous Functions 

Continuous functions / : I — * M are uniformly continuous ( I is compact, Theo- 
rem 2.5) and hence integrable. The proof of this faet is the same as for Theorem 
III. 5. 10. In the sequel, we shall prove the integrability of functions whose set of 
discontinuities is not too large. 

(5.4) Definition. A set X c I C R 2 is said to be a nuli set iffor every e > 0 there 
exist finitely many rectangles h- = [«fc, bk] x [c*, dk], (fc = 1 .... . n) such that 

(5.15) A' C (j I k and ^ M (/ fc )<e. 


Typical nuli sets are the boundaries of “regular” sets, e.g., triangles, disks, 
polygons (see the example of Fig. 5.5 1 ). This is a consequence of the following 
result. 



1 A nuli set only in the strict mathematical sense, of course! 
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(5.5) Lemma. Let <p : [0,1] — > M 2 represent a curve in the plane and suppose that 
(5.16) ||<p(s) — v?(i)lloo < M • [s — t\ for all [0,1]. 

Then, the image set <^([0, 1]) is a nuli set. 

Proof. We divide [0, 1] into n equidistant intervals Ji, J 2 , . . . , J„ of length l/n. 
For s, i e Jk we have ||<p(s) - ip{t)foo - -^/ n > i- e -> <P{Jk) is contained in a 
square Ik of side < 2 M /n. Therefore, the entire curve is contained in a union of 
n squares /„, whose area is bounded by 


v - ^ A/2M\ 2 4 M 2 

5>(4)<£(— j = — <£> 


if n is sufficiently large. This proves (5.15). 


□ 


Condition (5.16) is sufficient, but not necessary, for a curve to be a nuil set. 
For example, von Koch’s curve (von Koch 1906) of Fig. 5.6 is a nuli set (see Ex- 
ercise 5.5) that has infinite length (hence, (5.16) cannot be satisfied). The curve of 
Peano-Hilbert (Fig. 2.3) is not a nuli set, of course. However, Sierpinski’s triangle 
and carpet (Fig. 1.9 and Fig. 1.10) are other interesting examples of nuli sets. 



FIGURE 5.6. A nuli set, the 


; of von Koch 
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(5.6) Theorem. Let / : I — > M be a boundedfunction (satisfying (5.1)) and define 
X = {(x, y) € I ; f is not continuous at ( x , y) }. 

If X is a nuli set, then thefunction f(x, y ) is integrable. 

Proof. Let e > 0 be given and let (Jfc=i be a finite covering of X satisfying 
(5.15). We enlarge the //,. slightly and consider open rectangles Jj, , ,J n such 
that J k D h for all k and Y,k=i di-h) < 2e. The set H := I \ (J£ =1 Jk is 
then closed (Theorems 1.15 and 1.14) and therefore compact (Theorem 1.19). Re- 
stricted to H, the function f(x, y) is uniformly continuous (Theorem 2.5), which 
means that there exists a 5 > 0 such that | f(x,y) — /(£,??) | < e whenever 
\x — £| <6 and \y — rj\ < 6. 

We now start from a grid D x x D y containing all the vertices of the rectangles 
Ji, . . . , J n and refine it until the distances x t — Xi- \ and yj — y 3 -i are smaller 
than 5. We then split the difference S(D X x D y ) — s(D x x D y ) according to 

E E (Fij-fii)p(Iii)- 

hjCH 

The sum on the left is < eji (I) because of the uniform continuity of f(x, y ) on H\ 
the sum on the right is < 4 Me because the union of the rectangles (which do 
not lie in H ) is contained in (Jfc=i Jk w 'th an area smaller than 2e. Both estimates 
together show that S(D X x D y ) — s(D x x D y ) can be made arbitrarily small. □ 


Arbitrary Bounded Domains 


Dirichlet was particularly proud for his method of the discontinuous factor 
for multiple integrals. He used to say that it’s a very simple idea, and added 
with a smile, but one must have it. 

(H. Minkowski, Jahrber. DMV, 14 (1905), p. 161) 

Let 4 C R 2 be a bounded domain contained in a rectangle I (i.e., A c I) and let 
/ : A —> M be a bounded function. We want to find the volume under the surface 
z = f(x, y), with ( x , y) restricted to A. 

The idea (Dirichlet 1839) is to consider the function F : I —> M defined by 


(5.17) 


F{X}y) = lf^v) if (x,y)eA 
{ 0 else. 


If F is integrable in the sense of Definition 5.1, then we define 

(5.18) JJj(x,y)d(x,y) = JJ^F(x,y) d(x,y). 

A common situation is where / : A — > M is continuous on A and where the 
boundary of A, i.e., 


(5.19) 


dA := l (x,y) G 


each neighborhood of (x,y) 1 

contains elements of A and of C A y 
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is a nuil set. In this case, the discontinuities of F all lie in DA and Theorem 5.6 
implies the integrability of F. 

Iterated Integrals. The set A can often be described in one of the following ways: 

(5.20) A = {( x,y ) | a<x<b, ipi(x) <y< y>i{x) }, 

(5.21) A = {(x,y) | c<y<d, V’i(y) < x < 

where ’-Pi(x') and ipj(y) are known functions (see Fig. 5.7). In this case, the for- 
mulas (5.1 1), (5.12), together with (5.18), yield 


(5.22) 

JJ a f( x ,y) d(x,i 

rb / 2 (x) \ 

0= [ f{x,y)dy\dx, 

Ja \J<pi(x) / 

(5.23) 

JJ f(x,y)d(x,i 

fd / rip2(y) \ 

/) = / / f{x,y)dx dy. 

Jc \Ji>,(y) / 


dr 



H H 

Type (5.20) Type (5.21) Not type (5.20) 

FIGURE 5.7. Domains of M 2 


Examples. 1) For the set A = { (x, y) \ — a < x < a, x 2 < y < o 2 } we want to 
compute the center of gravity 

_ = ff A y d ( x >y) = 3o^ 

y I A y ) 5 ' 

We have the choice between (5.22) and (5.23): 
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2) Compute the moment of inertia of a disc A = { (x, y) x 2 + y 2 < a 2 } rotated 
around one of its diameters: 


(5.24) I = JJ y 2 d(x,y) = J J ^ ^ y 2 dy dx. 


The value of the inner integral is |(a 2 — x 2 ) 3 / 2 , and for the outer integral we use 
the substitution x = asiat, dx = a cos t dt, y/a 2 — x 2 = a cos t. This gives 


r /2 2 4 4 

1=1 —a cos t dt 
J--K/2 3 


7T 

4' 


The following fundamental theorem on coordinate changes will considerably sim- 
plify the computation of integrals such as (5.24) (see Example 5.8 below). 


The Transformation Formula for Double Integrals 


. . . this works for any other formula f f Zdxdy, since it can be transformed 
into f f Z(VR — ST ) dtdu by the same substitutions . . . 

(Euler 1769b) 


Integration by substitution (Eq. (II.4.14)), 



J f{g{u))g'(u) 


du, 


is an important tool for computing integrals. If g : [a, b] —* [c, d] is bijective (and 
continuously differentiable), this formula can be written as 


£ f( x ) dx = J f(g( u )) \g'(u)\du, 

where the absolute value corrects the sign in the case of g'(u) < 0 (and hence 
g{b) < g(a)). The following theorem gives the analog for double integrals. 


(5.7) Theorem (Euler 1769b, Opera, vol. XVII, p. 303 for n = 2, Lagrange 1773, 
Oeuvres, vol. 3, p. 624 for n = 3, Jacobi 1841, Werke, vol. 3, p. 436 for arbitrary 
n). Let f : A —> Ube continuous, g : U — > R 2 (U C K 2 open) be continuously 
differentiable, and assume that 

i) A = g(B); the sets j4,BcI 2 are compact; dA, dB are nuli sets; 

ii) g is injective on B\N, where N is a nuli set. 

Then, we have 


JJ A f(x,y)d(x,y) = JJ^f(g(u,v)) \detg'(u,v)\d(u,v). 


(5.25) 
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Polar Coordinates. One of the most important applications of Theorem 5.7 is 
when 


(5.26) g(r,ip) = (x,y), x = rcostp, y = r sirup 

(polar coordinates, see Sect. 1.5) and when A = {(x,y) \ x 2 + y 2 < R 2 }. With 
B = [0, R] x [0, 2tt], the assumption (i) of Theorem 5.7 is satisfied. The function g 
of (5.26) is not injective on B (we have g(r, 0) = g(r, 2 tt) for all r, and g(Q, ip) = 
(0, 0) for all <p). However, if we remove from B the nuil set N = ({0} x [0, 27r]) O 
([0, R] x {27r}) (see Fig. 5.8b), the function g becomes injective on B \ N. Since 


det g'(r, ip) = det ( °° S ^ 
y y \sm ip 


—r sin <p \ 
r cos ip J 


it follows from Theorem 5.7 and Eq. (5.11) that 

(5.27) // f(x,y)d(x,y)= / / f(r cos (p, r sirup) r dr dip. 

JJx 2 +y 2 <R 2 JO Jo 


Proof of Theorem 5.7. 

Main Ideas. We cover B by a division of closed squares Jg with side length <5 
(see Fig. 5.9, left picture), set B = {/3 | Jp fl B ^ 0}, and let (up, vp) be the left 
bottom vertex of J g . We assume that 5 is sufficiently small, so that all Jg (fi € B) 
still lie in U. The image set g( Jp) of Jp is approximately a parallelogram with 
sides (Fig. 5.9, right picture; Fig. 5.10) 
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(see Example 3.2). Now, from elementary geometry we know that the area of this 
parallelogram is equal to the determinant 2 

(5.29) areaparall. = | det(a 6 )| = | det ^ ^ ^ ^ | = |ai &2 — 0261 1, 

and, inspired by Eq. (5.9), we have 

[[ f(x,y)d(x,y) » ^2 f(g{up.vø)) • (area of g{Jp)) 

JJa øe b 

~ '52f(g( u ø’ v ø)) | det g' {up, vp) \ y(Jø) 
øeB 

~ JJ f(g{u,v))\detg'(u,v)\d(u,v). 

This motivates the validity of Eq. (5.25). 

Rigorous Estimates. The integrands in Eq. (5.25) are continuous on A and B, re- 
spectively. Since A and B are compact, these functions are bounded. Moreover, 
dA and OB are nuil sets, so that by (5.18) the two integrals in Eq. (5.25) exist. In 
the following we extend the domain of / to M 2 by putting f(x, y) = 0 outside 
of A. 

In order to grasp the precise meaning of the left integral of (5.25), we intro- 
duce, in addition to the above division of B, a division of A into squares I a , set 
A = {a \ I a: C\ A ^ 0}, and choose (x a . y a ) G I a n A (these are the fish-eyes 

The two expressions on the left and on the right of (5.29) are 

i) invariant under transformations of the type b 1— > b + Aa (Cavalieri’s principle), and 

ii) equal for rectangles parallel to the axis = diagonal matrices; see Fig. 5.8a. 

For more details see Strang (1976, p. 164). 
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in Fig. 5.10). Equation (5.25) will be proved by showing that the difference of the 
Riemann sums of the two integrals (see Theorem 5.2 and Eq. (5.9)) 

(5.30) ^2 f(x a ,y a )n{I a )- ^2 f(g(up,vp)) | det g'(up,vp)\ y(Jp), 

aeA peB 

is smaller than e for any given s > 0. It tums out that the side length of the squares 
I a must be much smaller than S (the side length of J ft). We take it < e ■ S. 



Partition of A. The left sum of (5.30) contains much more terms than the right 
one. In order to compare corresponding terms in this difference, we partition the 
set A as 

A = [J Vp (disjoint union), 

peB 

in such a way that 

(5.31a) (x a ,y a ) G g(Jp) tf a G Vp, 

(5.31b) « S P /3 if 4 C g(Jft) and JpCB\N 

(see Fig. 5.10). For a given a G A we can, since (x a , y a ) G A = g{B) c 
U/ 3 es always find a 3 which satisfies (5.31a). In order to be able to satisfy 

(5.31b), we have to show that there is at most one (3 G B with Jp c B \ N such 
that I a ø ff(Jp). Suppose that I a C g{Jp) (~l g(Jp>) for some (3 ± (3 r . Since 
g is injective on B\N, we have g{Jp) (T g{Jp>) C g(.Jø (T Jp>), so that also 
I a C g{Jft n Jfti). But Jp ri Jft / is either empty, or a point, or a segment of a 
line, so that g(Jp fl ,Jft' ) is a nuil set by Lemma 5.5. Hence I a C g{ J ft (~l Jp>) is 
impossible for /3 ^ 0'. 
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Once the sets Vp determined, we use YlaeA = ^øeB YlaeVp » wr ' te the 
expression of Eq. (5.30) as JZøeB Ag with 

(5.32) Dp = ^ f(x a ,y a )n(I a )- f(g(up,vp))\detg'(up,vp)\fj,(Jp), 
cxer« 

and estimate these terms. For the moment, we consider only so-called “interior” 
Jp’ s, i.e., we suppose that Jp C B \ N. We write Dp as 

(5.33a) Dp= Y (f(xa,y a ) ~ f(g(up,vp))') y(I a ) 

(5.33b) +f(g(up,vp))( Y ~ \det g '(up,vp)\ y(Jp)j 

and estimate these two expressions separately. 

Estimation of (5.33a). Since g(u,v) is continuously differentiable, g'(u,v) is 
bounded on the compact set B (Theorem 2.3), i.e., 

(5.34) \\g'(u,v)\\ < Mi for (u,v) G B. 

Hence, the Mean Value Theorem 3.7 implies that 

\\(x a ,ya) T ~ g(up,vp)\\ < Ml ■ 5 ■ V? for OL G Vp 

(indeed, (x a ,y a ) lies in g(Jp) and the points of Jp have from (up , vp) a distance 
of at most 5 ■ y/2 ). It then follows from the uniform continuity of / on A (/ 
is continuous on the compact set A ) that | f(x a ,y a ) - f(g(up,vp))\ < e for 
sufficiently small 5 (remember that g(Jp) C A since Jp is interior). Therefore, 

(5.35) I Y Ha).— f{g(uppvp))^ /i(/„) <£ Y AA)- 

'a£Vp ' aeVp 

Estimation of (5.33b). We now must concentrate more seriously on the question 
how precisely the set g(Jp) is approached by the parallelogram spanned by the 
vectors a and b in (5.28). We denote this set by 

Rø = {g(up,vp) + - 7 ^fup, v p) s + -^( up,vp)t | s g M, t g [o,£] }. 

We compute the distance of two corresponding points g(up + s. vp + i) in g(Jp) 
and g(up,vp) + ^ (up, vp)s+^ (up, vp)t in B,p in the following way: Equation 
(III.6.16) written for F(t) = g(up + ts, vp + rf) means that 

g(up +s,vp + t)~ g(up, vp) = J g'(up + ts, vp + rt)- dr. 

Subtracting dg/du(up, vp) ■ s + dg/dv(up, vp) ■ t from both sides, we obtain 
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| g(uø + s,vø + t) - (g{uø,vø) + ^{up,Vø) s + ^(uø,vø)t) | 

= | J (g'(up + TS,vp + Tt) - g'iupøø) ■ dr| < \/2e6 

for O < s, t < S. The last estimate follows from the uniform continuity of g 1 on 
the compact set B (recall that Jø is interior). 

Next we enclose Rø between two sets 

Rø C Rø C Rø 

where (see Fig. 5.10) 

Rø = {set of points with distance < 2\/2 e5 from the closest point of Rø] 

Rø = {set of points in Rø with distance > 2\/2 ed from the border}. 

Since the distance 2y/2 eS chosen in these definitions is twice y/2 s5, which, on 
one side, is the maximal distance between corresponding points of g(Jø) and Rø, 
and on the other side the maximal diameter of the squares I a , the sets Rø and 
Rø also enclose, because of (5.31a) and (5.31b), the union of I a for a G Vø (see 
Fig. 5.10 again) 

Rø C U /„ C R+. 

nevø 

Since Rø \ Rø is a “ring” of length < 4M^6 (see (5.34)) and of “thickness” 
< 4y/2e6, the above inclusions lead to the estimate 

| ]T g(I a ) - g{Rø ) | < /*( R%\Rø ) < (4M 1 5)(4V2eS). 

ocevp 

Consequently, we have 
(5.36) 

| f(9{uø,vø))(Y^ g{I a ) ~ \detg’(uø,vø)\g(Jø)^ < Ceå 2 = Ceg{Jø) 

aeTø 

with C = M ■ 4Mi • 4y/2. 

Finale. If Jø / B\N (so that Jø intersects the nuli set OB U N), we estimate Dø 
of Eq. (5.32) by \Dø \ < M^i/Jø), where Mi is a constant depending on bounds 
of / and g'. If 5 is sufficiently small, it follows from (5.15) that the sum of these 
\Dø\ is < Mie. For the remaining Jø we use (5.35) and (5.36), together with 
(5.33), and obtain 

\Dø\<e ]T g(I a ) + Csø(Jø). 
aer 0 

All in all, the difference (5.30) of the Riemann sums, i.e., Y^øeB Dø, is arbitrarily 
small (< Const ■ e). □ 
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(5.8) Example. Let A = {(x,y) \ x 2 + y 2 < R 2 } be the disc of radius R. Its area 
can be computed as 

JJ 1 • d(x, y)= J j rdrdip = ^- -2n = R 2 tt. 

The moment of inertia with respect to a rotation around a diameter is 

JJ y 2 d(x,y) = J J r 2 sin 2 ip ■ r dr dip = ■ 

The moment of inertia with respect to a central rotation axis orthogonal to the disc 
is 2 R 

JJj^x 2 +y 2 )d{x,y) = J^ r 2 ■ r dr dip = 


sinø dl p 



FIGURE5.il. Spherical coordinates 


Spherical Coordinates. The extension of the results of this section to higher di- 
mensions can be carried out without any major difficulties. Let us give an inter- 
esting application of the transformation formula (5.25) in three dimensions. 

We consider spherical coordinates g(r,ip,0) = ( x,y,z ) definedby (Fig. 5.11) 

(5.37) x = r cos <p sin 9, y = rsinysinØ, z = rcosd 

and are interested in triple integrals over a sphere A= {( x,y,z ) | x 2 + y 2 + z 2 < 
R 2 }. With B = [0,7?] x [0, 27r] x [0,7t] and N = dB, all the assumptions of 
Theorem 5.7 are satisfied. Computing the Jacobian matrix of g, 

( cos ip sinØ — r sin ip sin# rcosipcos6\ 
sinyi sinØ rcosysinØ rsinycosØ 1 , 
cos 0 0 — r sinØ ) 
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we obtain for its determinant det g'(r, <p, 9) = — r 2 sin 9, whence (Lagrange 1773) 

(5.38) JJJ A f(x,y,z)d(x,y,z) = JJJ f(r, <p, 9) r 2 sm6d(r, ip, 9), 

with f(r, ip, 9) = f(r cos</?sinØ, r sin</jsin$, r cos 9). Looking at Fig. 5.11, this 
formula can also be understood, as Lagrange says, “directement sans aucun cal- 
cul”. 

The volume of the sphere is obtained by taking f(x, y, z) = 1, 

JJJ 1 • d(x, y, z) = JJ jj" jJ r 2 sin 9 dr dip d9 = 

The moment of inertia with respect to an axis through the origin is 

JJJ (x 2 +y 2 )d(x,y, z) = JJJ r 2 sin 2 9 ■ r 2 sin# dr dipd9 = • 


Integrals with Unbounded Domain 

In certain situations, one is confronted with the computation of an integral over an 
unbounded domain. As in Sect. III.8 (improper integrals), this can be managed by 
taking a limit. We shall illustrate this on some interesting examples. 

“Gaussian” Integral. Suppose we want to compute I = / 0 °° e~ x dx. The idea 
is to take the square of I and to transform it into a double integral 

(5 ' 39) 

/ 2 = ^lim ^ J e~ x dx^(^ J e~ y dy'j = Jim JJ e~ x ~ v d(x,y), 

where Ar = [0, i?] x [0 ,R\. The integrand of the double integral suggests taking 
polar coordinates. Putting Dr = {(x, y) \ x 2 + y 2 < R 2 , x > 0, y > 0}, we 
have 

(5.40) lim ff e~ x ~ v d(x,y) = lim f f e~ r r dr dip = j. 
R^oo JJ Dr R-^oo J o J o 4 

Here, the additional “r” originating from Eq. (5.27) was most welcome and al- 
lowed integration of the inner integral with an easy substitution. The question is 
whether the two limits in (5.39) and (5.40) are equal. If f(x, y) > 0 (as is the case 
here), we have 

ff f(x,y)d(x,y)< ff f(x,y)d(x,y)< ff 
JJdr JJAr JJd v2r 


f(x,y) d(x, y) 



346 IV. Calculus in Several Variables 


as a consequence of the inclusion Dr c Ar c 
D y/2 r ( see small drawing to the right). 

Thus, the existence of 

lim« J] Dli f{x.y) d(x,y) implies that of 
lim^oo JJ AR f(x,y)d(x,y), and both lim- 
its have the same value. Consequently, I = 
There is also an interesting connection 
with the gamma function, 



( 5 . 41 ) ^ = 21 = 2 / e~ x2 dx=[ e~ t ^= = r( 1 / 2 ) 

Jo Jo Vt 


(see Definition III.8.10). 



FIGURE5.12. Study of the transformation (5.43) 


A Product Formula for the Gamma Function. From Definition III.8.10, we 

have 

r(a) = jf°° e^x^dx, r(/3) = J™ e^y^dy, 
so that (see Jacobi 1834, Werke, vol. VI, p. 62) 

(5.42) r(a)r(/3)= lim [[ e~ x ~ V 1 / -1 d(x, y), 

R ^°° JJa r 

where, as above, Ar = [0, R] x [0,i?]. This time, we use the transformation 
(Fig. 5.12) 


(5.43) 


x + y = u 
y = v 




whose Jacobian matrix satisfies det g'(u, v) = det 
{(z, y) | x > 0, y > 0, x + y < R}, we find that 


1 -1 

0 1 
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^lim JJ e x v x a 1 y d 1 d(x,y)= lim j e U (^J ( u — v) a 1 v d 1 dv^du 

(5.44) = lim f e~ u u ° l+ ^~ 1 du ■ f (1 — dt, 

R^OO J 0 J 0 

where we have used the substitution v = u ■ t (0 < t < 1). The same argument 
as for the Gaussian integral guarantees that the two limits of (5.42) and (5.44) are 
equal. In (5.44), the so-called betafunction appears, 


(5.45) 

and we have the formula 

(5.46) B(a,(3) = 


B(a,0):= [ (1-t)" l l J l dL 
Jo 

W) 


r(a + (3) 


which generalizes Eq. (II.4.34) to arbitrary exponents. 

Counterexample. The function f(x,y ) = (x — y)/(x + y ) 3 is continuous on 
A = [1, oo] x [1, oo]. Nevertheless, we have (see also Exercise 5.3) 


<547) 


(x + y ) 3 


+1/2 


- 1/2 


which violates Eqs. (5.11) and (5.12). This phenomenon is only possible for an 
unbounded domain A and a function / that changes sign on A. 


Exercises 


5.1 Let g : [a. b] — > R be a bounded function and assume that all its Riemann 

sums converge to a fixed value a if max,; (ir.; — — > 0. Prove that g(x) 

is integrable (in the sense of Riemann) and that g(x) dx = a. 

5.2 For I := [0, tt] x [0, 1] define / : I -» M by 


f(x,y ) = 


cosx 

0 


ify e Q 

if not. 


Which of the two integrals 


J (/ f{x,y)dxjdy and / (/ f( x ’V) d y) dx 


exists? Is the function / : I — > M integrable? 
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rr 


{x + y) 3 


dxdy ^ 


jfjf 


(æ + y ) 3 


dy dx 


- 1/2 


+1/2 


(Fig. 5.13). 
Hint. Use 


Is this relation a contradiction to Eqs. (5.11) and (5.12)? 

9 ( ~ x ) = x ~y 

dx\(x + y) 2 J ( x+y ) 3 ' 



FIGURE5.13. Function with noncommuting iterated integrals (stereogram) 


5.4 Try to compute 


2 r cos ip -+ 


0<Æ<1. 


There is a better way of computing this integral, where the formula (see the 
Example for (II.5.21)) 


dip 


f b cos ip 


a > \b\ 


is helpful (the result is I = 0; see also Exercise III.5.4). 

5.5 Prove that von Koch’s curve of Fig. 5.6, though of infinite length, represents 
a nuil set. 

Hint. Let the distance of the two end points be 1. Considering the uppermost 
curve of Fig. 5.6, we see that it is contained in a rectangle of sides 1 and 1/3. 
The next curve is contained in the union of four rectangles of sides 1/3 and 
1 /9, and so on. 

5.6 Show that “Sierpinski’s triangle” (Fig. 1.9) is a nuli set in M 2 . 

5.7 The set y([0, 1]), with 



is a nuil set despite the faet that the function ip does not satisfy 


||<p(t) — v?(s)|| < M\t — s | for all i, s € [0, 1]. 
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5.8 Compute the area of the surface enclosed by the loop of the folium cartesii 
(4.29). Try two methods: a) polar coodinates; b) the change of coordinates 

u = x + y , v = x — y . 

5.9 Compute the area of the surface enclosed by the loops of the lemniscate 

{x 2 + y 2 ) 2 — 2(x 2 — y 2 ) = 0. 

Try two methods: a) polar coodinates; b) iterated integrals. 

5.10 Let 

B n (r) = {(xi, . . . ,x n ) el"; x\ + . . . + x 2 < r 2 } 

be the hall of radius r in R". Show that its volume is 



Indication. Proceed by induction on n > 1. A formula derived above for the 
beta function will be helpful. 

5.11 Compute the volume of the simplex 

A n (c) = {(xi, . . . ,x„) e M n ; Xj > 0 and xi + X 2 + . . . + x n < c}. 

The result is c n /n\. 

5.12 Compute 

JJJ xyz{l — x — y — z) dxdy dz, 

where T is the tetrahedron defined by 

T = {(x,y,z) ; x > 0, y > 0, z > 0, x + y + z < l}. 

Use the substitution 


The result is 1/7!. 

5.13 Let A r = [0, R] x [0, R], D R = {(x, y) \ x 2 + y 2 < R 2 }, and consider the 
limits 

Jim JJ sin (x 2 + y 2 ) d(x,y), Jim JJ sin (x 2 + y 2 ) d(x,y). 

Prove that the first limit exists, whereas the second does not. 

Hint. For the first integral use sin(x 2 + y 2 ) = sinx 2 cos y 2 + cosx 2 siny 2 
and prove that f () R sin x 2 dx converges to a limit for i? — > oo. For the second 
integral use polar coordinates. 
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5.14 Prove that 

Then, deduce from these relations the statement of Eq. (II. 6. 9). 

Hint. Substituting x = Usfz (z is a positive parameter) in Eq. (5.41) yields 

(549) 7t = ^C e '^ du - 

Multiply this equation by e lz , integrate from A > 0 to B, change the order 
of integration in the iterated integrals, and consider the limits B —> oo and 
A — > 0. Justify all steps. 

Remark. With deeper results of complex analysis, this becomes an easy ex- 
ercise. 
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page 1: ... da der Lehrer einsichtig genug war den ungewohnlichen Schiller (Jacobi) 

gewahren zu lassen und es zu gestatten, dafi dieser sich mit Eulers Introductio 
beschaftigte, wahrend die iibrigen Schiiler miihsam .... 

(Dirichlet 1852, Gedachtnisrede auf Jacobi, in Jacobi’s Werke, vol. I, p. 4) 
page 2: Tant que l’Algébre et la Géométrie ont été séparées, leurs progrés ont été lents 

et leurs usages bomés; mais lorsque ces deux Sciences se sont réunies, elles se 
sont prété des forces mutuelles et ont marché ensemble d’un pas rapide vers 
la perfection. C’est å Descartes qu’on doit l’application de T Algébre å la Géo- 
métrie, application qui est devenue la clef des plus grandes découvertes dans 
toutes les branches des Mathématiques. 

(Lagrange 1795, Oeuvres, vol. 7, p. 271) 
Diophante peut étre regardé comme l’inventeur de l’Algébre; . . . 

(Lagrange 1795, Oeuvres , vol. 7, p. 219) 
page 4: Tartalea exposa sa solution en mauvais vers italiens . . . 

(Lagrange 1795, Oeuvres, vol. 7, p. 22) 
. . . trovato la sua regola generale, ma per al presente la voglio tacere per piu 
rispetti. (Tartaglia 1530, see M. Cantor 1891, vol. II, p. 485) 

page 6: Le Logistique Numerique est celuy qui est exhibé & traité par les nombres, 

le Specifique par especes ou formes des choses: comme par les lettres de 
T Alphabet. (Viéte 1600, Algebra nova, French ed. 1630) 

page 8; Ou ie vous prie de remarquer en passant, que le scrupule, que faisoient les 
anciens d’vser des termes de l’Arithmetique en la Geometrie, qui ne pouuoit 
proceder, que de ce qu’ils ne voyoient pas assés clairement leur rapport, causoit 
beaucoup d’obscurité, & d’embaras, en la fa?on dont ils s’expliquoient. 

(Descartes 1637) 

page 18: Quoy que cette proposition ait vne infinité de cas, i’en donneray vne demon- 
stration bien courte, en supposant 2 lemmes. 

Le 1. qui est evident de soy-mesme, que cette proportion se rencontre dans la 
seconde base; car il est bien visible que y> est å o comme 1, å 1. 

Le 2. que si cette proportion se trouue dans vne base quelconque, elle se trou- 
uera necessairement dans la base suivante. 

(Pascal 1654, one of the first induction proofs) 
page 29: Der Begriff des Logarithmus wird von den Schulem im allgemeinen nur sehr 
schwer verstanden. (van der Waerden 1957, p. 1) 

page 34: Mense Septembri 1668, Mercator Logarithmotechniam edidit suam, quae spec- 
imen hujus Methodi (i.e.. Serierum Infinitarum) in unica tantum Figura, nempe, 
Quadratura Hyperbolæ continet. (Letter of Collins, Julii 26, 1672) 

page 43: Die Gleichungen . . . haben . . . ein ehrwiirdiges Alter. Schon Ptolemaus leitet 
(L. Vietoris 1949, J. reine ang. Math., vol. 186, p. 1) 


page 52: ... vous ne laisserez pas d’avoir trouvé une proprieté du cercle tres remar- 

quable, ce qui sera celebre a jamais parmi les geometres. 

(Letter of Huygens to Leibniz, Nov. 7, 1674) 
page 57: Au reste tant les vrayes racines que les fausses ne sont pas tousiours reelles; 


mais quelquefois seulement imaginaires; c’est a dire qu’on peut bien tousiours 
en imaginer autant que iay dit en chasque Equation; mais qu’il n’y a quelque- 
fois aucune quantité, qui corresponde a celles qu’on imagine. 

(Descartes 1637, p. 380) 

page 58: ... quomodo quantitates exponentiales imaginariae ad sinus et cosinus arcuum 

realium reducantur. (Euler 1748, Introductio, §138) 

page 62: ... et i e voy déja la route de trouver la somme de cette rangée y + j + 5 + y^etc. 

(Joh. Bemoulli, May 22, 1691, letter to his brother) 
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page 68: 


page 70: 


page 76: 


page 92: 


page 98: 


page 118: 


La Théorie des fractions continues est une des plus utiles de l’Arithméti- 
que . . . comme elle manque dans les principaux Ouvrages d’Artihmétique et 
d’Algébre, elle doit étre peu connue des géométres . . . je serai satisfait si je 
puis contribuer å la leur rendre un peu plus familiére. 

(Lagrange 1793, Oeuvres, vol. 7, p. 6-7) 
Die Veranlassung aber, diese Formeln zu suchen, gab mir des Herm Eu- 
lers Analysis infinitorum, wo der Ausdruck ... in Form eines Beyspieles 
vorkommt. (Lambert 1770a) 

Ich kann mit einigem Grunde zweifeln, ob gegenwartige Abhandlung von den- 
jenigen werde gelesen, oder auch verstanden werden die den meisten Antheil 
davon nehmen soliten, ich meyne von denen, die Zeit und Miihe aufwenden, 
die Quadratur des Circuls zu suchen. Es wird sicher genug immer solche geben 
... die von der Geometrie wenig verstehen . . . (Lambert 1770a) 

L’étendué de ce calcul est immense: il convient aux Courbes mécaniques, 
comme aux géometriques; les signes radicaux luy sont indifferens, & méme 
souvent commodes; il s’étend å tant d’indéterminées qu’on voudra; la com- 
paraison des infiniment petits de tous les genres luy est également facile. Et 
de lå naissent une infinité de découvertes surprenantes par rapport aux Tan- 
gentes tant courbes que droites, aux questions De maximis & nunimis, aux 
points d’infléxion & de rebroussement des courbes, aux Dévelopées, aux Caus- 
tiques par réfléxion ou par réfraction, &c. comme on le verra dans eet ouvrage. 

(Marquis de L’Hospital 1696, Analyse des infiniment petits ) 
Et j’ose dire que c’est cecy le problésme le plus utile, & le plus general non 
seulement que ie syache, mais mesme que i’aye iamais desiré de sgauoir en 
Geometrie . . . (Descartes 1637, p. 342) 

Quel mépris pour les non-Anglois! Nous les avons trouvé ces methodes, sans 
aucun secours des Anglois. (Joh. Bernoulli 1735, Opera, vol. IV, p. 170) 
Ce que tu me rapportes å propos de Bernard Niewentijt n’est que quincaillerie. 
Qui pourrait s’empécher de rire devant les ratiocinations si ridicules qu’il båtit 
sur notre calcul, comme s’il était aveugle å ses avantages. 

(Letter of Joh. Bernoulli, quoted from Parmentier 1989, p. 316). 
Nous appellerons la fonetion fx, fonetion primitive, par rapport aux fonetions 
fx, f" x, &c. qui en dérivent, et nous appellerons celles-ci , fonetions dérivées, 
par rapport å celle-lå. (Lagrange 1797) 

Je desire seulement qu’il sache que nos questions de maximis et minimis et 
de tangentibus linearum curvarum sont parfaites depuis huit ou dix ans et 
que plusieurs personnes qui les ont vues depuis cinq ou six ans le peuvent 
témoigner. 

(Letter from Fermat to Descartes, June 1638, Oeuvres, tome 2, p. 154-162) 
Mon Frére, Professeur å Båle, a pris de lå occasion de rechercher plusieurs 
courbes que la Nature nous met tous les jours devant les yeux . . . 

(Joh. Bernoulli 1692) 

Je suis tres persuadé qu’il n’y a gueres de geometre au monde qui vous puisse 
étre comparé. (de L’Hospital 1695, letter to Joh. Bernoulli) 

La quantité cy dessus 

ppads 

qqss — ppaa 

se reduit immediatement, sans autre changement, å deux fractions logarithmi- 
cales, en la partageant ainsi 

ppads _ \pds _ \pds 
qqss — ppaa qs — pa qs+pa 

(Annex to a letter of Joh. Bernoulli, 1699, see Briefwechsel, vol. 1, p. 212) 



page 126: 


page 135: 

page 136: 

page 137: 

page 140: 

page 144: 
page 154: 
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Problema 3: Si X denotet functionum quamcunque rationalem fractam ipsius 
x, methodum describere, cuius ope formulae Xdx integrale investigari conve- 
niat. (Euler 1768, Opera Omnia, vol. XI, p. 28) 

. . . weil die Analysten nach allen Versuchen endlich geschlossen haben, daB 
man die Hoffnung aufgeben miisse, elliptische Bogen durch algebraische For- 
meln, Logarithmen und Circulbogen auszudriicken. 

(J.H. Lambert 1772, Opera, vol. I, p. 312) 
Bien que le probléme (des quadratures) ait une durée de deux cents ans 
å peu pres, bien qu’il était l’objet de nombreuses recherches de plusieurs 
géométres : Newton, Cotes, Gauss, Jacobi, Hermite, Tchébychef, Christoffel, 
Heine, Radeau [sic], A. Markov, T. Stitjes [s; c], C. Possé, C. Andréev, N. Sonin 
et d’autres, il ne peut étre considéré, cependant, comme suffisamment épuisé. 

(Steklov 1918) 

On s’assurera aisément par notre méthode que l’intégrale J ,f : ' : , dont les 
Géométres se sont beaucoup occupés, est impossible sous forme linie . . . 

(Liouville 1835, p. 113) 

Claudius Perraltus Medicus Parisinus insignis, tum & Mechanicis atque Ar- 
chitectonicis studiis egregius, & Vitruvii editione notus, idemque in Regia sci- 
entiarum Societate Gallica, dum viveret, non postremus, mihi & aliis ante me 
multis proposuit hoc problema, cujus nondum sibi occurrisse solutionem in- 
genue fatebatur . . . (Leibniz 1693) 

Mais pour juger mieux de l’excellence de vostre Algorithme j’attens avec im- 
patience de voir les choses que vous aurez trouvées touchant la ligne de la 
corde ou chaine pendante, que Mr. Bernouilly vous a proposé å trouver, dont 
je luy scay bon gré, parce que cette ligne renferme des proprietez singulieres 
et remarquables. Je l’avois considerée autre fois dans ma jeunesse, n’ayant que 
15 ans, et j’avois demontré au P. Mersenne, que ce n’estoit pas une Parabole 
(Letter of Huygens to Leibniz, Oct. 9, 1690) 
Les efforts de mon frere furent sans succes, pour moi, je fus plus heureux, car 
je trouvai P adresse ... Il est vrai que cela me couta des meditations qui me 
deroberent le repos d’une nuit entiere . . . 

(Joh. Bernoulli, see Briefwechsel, vol. 1, p. 98) 
Datis in plano verticali duobus punctis Ast B assignare mobili M, viam AMB 
per quam gravitate sua descendens et moveri incipiens a puncto A, brevissimo 
tempore perveniat ad alterum punctum B. (Joh. Bernoulli 1696) 

Ce probléme me paroist des plus curieux et des plus jolis que l’on ait encore 
proposé, et je serois bien aise de m’y appliquer, mais pour cela il seroit neces- 
saire que vous me l’envoyassiez réduit å la mathematique pure, car le phisique 
m’embarasse . . . (de L’Hospital, letter to Joh. Bernoulli, June 15, 1696) 

En vérité rien n’est plus ingenieux que la solution que vous donnez de l’égalité 
de Mr. votre frere; & cette solution est si simple qu’on est surpris que ce 
probleme ait paru si difficile: c’est lå ce qu’on appelle une élégante solution. 

(P. Varignon, letter to Joh. Bernoulli “6 Aoust 1697”) 
Per liberare la premessa formula dalle seconde differenze, .... chiamo p la 
sunnormale BF. (Riccati 1712) 

... es ist ganz unmoglich, heute noch eine Zeile von d’Alembert hinun- 
terzuwiirgen, wahrend man die meisten Eulerschen Sachen noch mit Entziicken 
liest. (Jacobi, see Spiess 1929, p. 139) 

Ich håbe immer wieder beobachtet, daB Mathematiker und Physiker mit ab- 
geschlossenem Examen flber theoretische Ergebnisse sehr gut, aber iiber die 
einfachsten Nåherungsverfahren nicht Bescheid wuBten. 

(L. Collatz 1951, Num. Beh. Diffgl., Springer- Verlag) 

PROBLEMA 85: Proposita aequatione differentiali quacunque eius integrale 
completum vero proxime assignare. (Euler 1768, §650) 
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page 156: PROBLEM A 86: Methodum praecedentem aequationes differentiales proxime 
integrandi magis perficere, ut minus a veritate aberret. (Euler 1768, §656) 
page 160: Der Konig nennt mich ‘meinen Professor’, und ich bin der gliicklichste Mensch 
auf der Welt! (Euler is proud to serve Frederick II in Berlin) 

J’ai ici un gros cyclope de géométre ... il ne reste plus qu’un oeil å notre 
homme, et une courbe nouvelle, qu’il calcule å présent, pourrait le rendre aveu- 
gle tout å fait. (Frederick II, see Spiess 1929, p. 165-166.) 

page 170: . . . et je ne réponds pas que je fasse encore de la géométrie dans dix ans d’ici. Il 
me semble aussi que la mine est presque déjå trop profonde et ... il faudra tot 
ou tard P abandonner. La physique et la chimie offrent maintenant des richesses 
plus brillantes et d’une exploitation plus facile . . . 

(Lagrange, Sept. 21, 1781, Letter to d’Alembert, Oeuvres, vol. 13, p. 368) 
page 172: On dit qu’une grandeur est la limite d’une autre grandeur, quand la seconde 
peut approcher de la premiere plus pres que d’une grandeur donnée, si petite 
qu’on la puisse supposer, . . . 

(D’Alembert 1765, Encyclopédie, tome neuvieme, å Neufchastel) 
Lorsqu’une quantité variable converge vers une limite fixe, il est souvent utile 
d’indiquer cette limite par une notation particuliére, c’est ce que nous ferons, 
en placant l’abréviation 

devant la quantité variable dont il s’agit . . . 

(Cauchy 1821, Cours d’ Analyse) 
page 177: . . . Je mehr ich ueber die Principien der Functionentheorie nachdenke — und 
ich thue dies unablassig — , um so fester wird meine Ueherzeugung, dass diese 
auf dem Fundamente algebraischer Wahrheiten aufgebaut werden muss . . . 

(Weierstrass 1875, Werke, vol. 2, p. 235) 
Bitte vergi 6 alles, was Du auf der Schule gelernt hast; denn Du hast es nicht gel- 
emt. . . . indem meine Tochter bekanntlich schon mehrere Semester studieren 
(Chemie), schon auf der Schule Differential- und Integralrechnung gelernt zu 
haben glauben und heute noch nicht wissen, warum x ■ y = y ■ x ist. 

(Landau 1930) 

x/3 ist also nur ein Zeichen fur eine Zahl, welche erst noch gefunden werden 
soli, nicht aber deren Definition. Letztere wird jedoch in meiner Weise, etwa 

(1.7,1.73,1.732,...) 

befriedigend gegeben. (G. Cantor 1889) 

. . . Definition der irrationalen Zahlen, bei welcher Vorstellungen der Geome- 
trie . . . oft verwirrend eingewirkt haben. . . . Ich stelle mich bei der Definition 
auf den rein formalen Standpunkt, indem ich gewisse greifbare Zeichen Zahlen 
nenne, so dass die Existenz dieser Zahlen also nicht in Frage steht. 

(Heine 1872) 

Fiir mich war damals das Geftihl der Unbefriedigung ein so tiberwaltigendes, 
dass ich den festen Entschluss fasste, so lange nachzudenken, bis ich eine 
rein arithmetische und vollig strenge Begriindung der Principien der Infinites- 
imalanalysis gefunden haben wiirde. . . . Dies gelang mir am 24. November 
1858, . . . aber zu einer eigentlichen Publication konnte ich mich nicht recht 
entschliessen, weil erstens die Darstellung nicht ganz leicht, und weil ausser- 
dem die Sache so wenig fruchtbar ist. (Dedekind 1872) 

Die Analysis zu einem biossen Zeichenspiele herabwiirdigend . . . 

(Du Bois-Reymond 1 882, Allgemeine Funktionentheorie, Tiibingen) 
page 181: ... jusqu’å présent on a regardé ces propositions comme des axiomes. 

(Méray 1869, see Dugac 1978, p. 82) 
page 184: Une chose étonnante, je trouve, c’est que Monsieur Weierstrass et Monsieur 
Kronecker peuvent trouver tant d’auditeurs — entre 15 et 20 — pour des cours 
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si difficiles et si élevés. 

(letter of Mittag-Leffler 1875, see Dugac 1978, p. 68) 
page 188: Je consacrerai toutes mes forces å répandre de la lumiére sur rimmense obscu- 
rité qui regne aujourd’hui dans 1’ Analyse. Elle est tellement dépourvue de tout 
plan et de tout systéme, qu’on s’étonne seulement qu’il y ait tant de gens qui 
s’y livrent — et ce qui pis est, elle manque absolument de rigueur. 

(Abel 1826, Oeuvres, vol. 2, p. 263) 
Cauchy est fou, et avec lui il n’y a pas moyen de s’entendre, bien que pour le 
moment il soit celui qui sait comment les mathématiques doivent étre traitées. 
Ce qu’il fait est excellent, mais tres brouillé . . . 

(Abel 1826, Oeuvres, vol. 2, p. 259) 
page 202: On appelle ici Fonetion d’une grandeur variable, une quantité composée de 
quelque maniére que ce soit de cette grandeur variable & de constantes. 

(Joh. Bemoulli 1718, Opera, vol. 2, p. 241) 
Quocirca, si /(§ + c) denotet funetionem quameunque . . . 

(Euler 1734, Opera, vol. XXII, p. 59) 
Entspricht nun jedem x ein einziges, endliches y, ... so heisst y eine . . . Func- 
tion von x fur dieses Intervall. . . . Diese Definition schreibt den einzelnen 
Theilen der Curve kein gemeinsames Gesetz vor; man kann sich dieselbe aus 
den verschiedenartigsten Theilen zusammengesetzt oder ganz gesetzlos geze- 
ichnet denken. (Dirichlet 1837) 

page 204: . . . f(x) sera fonetion continue, si ... la valeur numérique de la différence 
f(x + a) - /(æ) 

déeroit indéfiniment avec celle de a. . . . 

(Cauchy 1821, Cours d' Analyse, p. 43) 
Wir nennen dabei eine Grosse y eine stetige Function von x, wenn man nach 
Annahme einer Grosse e die Existenz von 5 beweisen kann, sodass zu jedem 


Wert zwischen xo — S ... xo + S der zugehorige Wert von y zwischen yo — 
e . . . j/o + e liegt. (Weierstrass 1874) 


page 206: Ce théoréme est connu depuis longtemps . . . 

(Lagrange 1807, Oeuvres, vol. 8, p. 19, see also p. 133) 
In seinem Satze, dem zufolge eine stetige Funktion einer reellen Veranderlichen 
ihre obere und untere Grenze stets wirklich erreicht, d. h. ein Maximum und 
Minimum notwendig besitzt, schuf WEIERSTRASS ein Hilfsmittel, dass heute 
kein Mathematiker bei feineren analytischen oder arithmetischen Untersuchun- 
gen entbehren kann. (Hilbert 1897, Gesammelte Abh., vol. 3, p. 333) 

page 209: Der Begriff des Grenzwertes einer Funktion ist wohl zuerst von Weierstrass mit 
geniigender Scharfe definiert worden. 

(Pringsheim 1899, Enzyclopådie der Math. Wiss., Band II. 1, p. 13) 
page 213: Dans l’ouvrage de M. Cauchy on trouve le théoréme suivant: “Lorsque les 
différens termes de la série m> + «i + «2 + . . . sont des fonetions . . . contin- 
ues, ... la somme s de la série est aussi . . . fonetion continue de x.” Mais il me 
semble que ce théoréme admet des exceptions. Par exemple la série 

est discontinue pour toute valeur (2 m + l)7r de x, . . . 

(Abel 1 826, Oeuvres, vol. 1 , p. 224-225) 
page 217: Es scheint aber noch nicht bemerkt zu sein, dass . . . diese Continuitat in jedem 
einzelnen Punkte . . . nicht diejenige Continuitat ist . . . die man gleichmassige 
Continuitat nennen kann, weil sie sich gleichmassig fiber alle Punkte und alle 
Richtungen erstreckt. (Heine 1870, p. 361) 

Den allgemeinen Gang des Beweises einiger Satze im §. 3 nach den Princip- 
ien des Herm Weierstrass kenne ich durch mfindliche Mittheilungen von ihm 
selbst, von Herrn Schwarz und Cantor, so dass . . . (Heine 1872, p. 182) 
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page 221: 

page 224: 
page 230: 

page 232: 
page 235: 

page 240: 
page 242: 

page 245: 
page 252: 

page 253: 

page 263: 


Also zuerst: Was hat man unter f f(x) dx zu verstehen? 

(Riemann 1854, Werke, p. 239) 
L’ illustre géométre [Riemann] . . . généralise, par une de ces vues qui n’appar- 
tiennent qu’aux esprits de premier ordre, la notion de l’intégrale définie, . . . 

(Darboux 1875) 

Ich fiihle indessen, dass die Art, wie das Criterium der Integrirbarkeit formulirt 
wurde, etwas zu wiinschen iibrig lasst. (Du Bois-Reymond 1875, p. 259) 
Bis in die neueste Zeit glaubte man, es sei das Integral einer convergenten Reihe 
. . . gleich der Summe aus den Integralen der einzelnen Glieder, und erst Herr 
Weierstrass hat bemerkt . . . 

(Heine 1870, Ueber trig. Reihen, J. f. Math., vol. 70, p. 353) 
Da diese Functionen noch nirgends betrachtet sind, wird es gut sein, von einem 
bestimmten Beispiele auszugehen. (Riemann 1854, Werke, p. 228) 

. . . larigueur, dont je m’étais fait une loi dans mon Cours d’ analyse, . . . 

(Cauchy 1829, Legons) 

Die vollstandige Veranderung J'(x + h) — f(x) . . . lasst sich im allgemeinen in 
zwei Teile zerlegen . . . (Weierstrass 1861) 

Voir la belle démonstration de ce théoréme, donnée par M. O. Bonnet, dans le 
Traité de Calcul différentiel et intégral de M. Serret, 1. 1, p. 17. 

(Darboux 1875, p. 111) 

. . . tout å fait au-dessus de la vaine gloire, que la plupart des Sgavans recher- 
chent avec tant d’avidité . . . 

(Fontenelle’s opin- 
ion conceming Guillaume-Fran9ois-Antoine de Lhospital, Marquis de Sainte— 
Mesme et du Montellier, Comte d’Antremonts, Seigneur d’Ouques, 1661— 
1704) 

Au reste je reconnois devoir beaucoup aux lumieres de Mrs Bemoulli, sur tout 
å celles du jeune presentement Professeur å Groningue. Je me suis servi sans 
fa§on de leurs découvertes . . . (de L’ Hospital 1696) 

Ou est-il démontré qu’on obtient la différentielle d’une série infinie en prenant 
la différentielle de chaque terme? 

(Abel, Janv. 16, 1826, Oeuvres, vol. 2, p. 258) 
... et de juger de la valeur du reste de la série. Ce probléme, l’un des plus 
importants de la théorie des séries, n’a pas encore été résolu . . . 

(Lagrange 1797, Oeuvres, vol. 9, p. 42-43, 71) 
... la formule de TAYLOR, cette formule ne pouvant plus étre admise comme 
générale . . . (Cauchy 1823, Résumé, p. 1) 

. . . mais celui qui me fait le plus de plaisir c’est un mémoire ... sur la simple 

, m(m — 1) 

+ mx+ - x +... 

J’ose dire que c’est la premiere démonstration rigoureuse de la formule binome 
(Abel, letter to Holmboe 1826, Oeuvres, vol. 2, p. 261) 
Bis auf die neueste Zeit hat man allgemein angenommen, dass eine . . . con- 
tinuirliche Function . . . auch stets eine erste Ableitung håbe, deren Werth nur 
an einzelnen Stellen unbestimmt oder unendlich gross werden konne. Selbst 
in den Schriften von Gauss, Cauchy, Dirichlet findet sich meines Wissens 
keine Åusserung, aus der unzweifelhaft hervor ginge, dass diese Mathematiker, 
welche in ihrer Wissenschaft die strengste Kritik iiberall zu uben gewohnt 
waren, anderer Ansicht gewesen seien. (Weierstrass 1872) 

Il y a cent ans, une pareille fonetion eut été regardée comme un outrage au sens 
commun. 

(Poincaré 1899, L’oeuvre math. de Weierstrass, Acta Math., vol. 22, p. 5) 
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page 266: Telle est la proposition fondamentale qui a été établie par Weierstrass. 

(Borel 1905, p. 50) 

page 273: Es mag auffallend erscheinen, dass diese so einfache Idee, welche im Grunde 
genommen in weiter nichts besteht, als dass eine Vielfachsumme verschiedener 
Grossen (als welche hiemach die extensive Grosse erscheint) als selbststandige 
Grosse behandelt wird, in der That zu einer neuen Wissenschaft sich entfalten 
soli; . . . (Grassmann 1862, Ausdehnungslehre, p. 5) 

... il est tres utile d’introduire la considération des nombres complexes, ou 
nombres formés avec plusieurs unités, . . . 

(Peano 1888a, Math. Ann., vol. 32, p.450) 
page 278: Unter einer “Menge” verstehen wir jede Zusammenfassung M von bestimmten 
wohlunterschiedenen Objekten m unserer Anschauung oder unseres Denkens 
(welche die “Elemente” von M genannt werden) zu einem Ganzen. 

(G. Cantor 1 895, Werke, p. 282) 
Aus dem Paradies, das Cantor uns geschaffen, soli uns niemand vertreiben 
konnen. (Hilbert, Math. Ann., vol. 95, p. 170) 

page 283: Nous avons déjå signalé et nous reconnaitrons dans tout le cours de ce Livre 
l’importance des ensembles compacts. Tous ceux qui ont eu å s’occuper d’ Ana- 
lyse générale ont vu qu’il était impossible de s’ en passer. 

(Fréchet 1928, Espaces abstraits, p. 66) 
page 287: ... ist die Schwierigkeit, welche nach dem Urtheile aller Mathematiker . . . das 
Studium jenes Werkes wegen seiner . . . mehr philosophischen als mathema- 
tischen Form dem Leser bereitet .... Jene Schwierigkeit nun zu beheben, war 
daher eine wesentliche Aufgabe fur mich, wenn ich wollte, dass das Buch nicht 
nur von mir, sondem auch von anderen gelesen und verstanden werde. 

(Grassmann 1862, “Professor am Gymnasium zu Stettin”) 
page 291: Eine stetige Kurve kann Flachenstiicke enthalten: das ist eine der merkwiirdig- 
sten Tatsachen der Mengenlehre, deren Entdeckung wir G. Peano verdanken. 

(Hausdorff 1914, p. 369) 

page 300: Wir Deutsche gebrauchen statt dessen nach Jacobi’s Vorgange fur partielle 
Ableitungen das runde d. (Weierstrass 1874) 

page 302: . . . dafi WeierstraB’ unmittelbarer Unterricht die Spontanitat der Horer zu sehr 
unterdriickte und in der Tat nur fur den voll verstandlich war, der schon an- 
derweitig mit dem Stoff sich vertraut gemacht hatte. Die groBeren Werke sind 
von Auslandern geschrieben . . . Wohl das erste stammt von meinem Freunde 
Stol z (Innsbruck): “Vorlesungen iiber allgemeine Arithmetik” .... 

(F. Klein 1926, Entwicklung der Math., p. 291) 
page 316: Or il est facile de voir que les différentielles de cette espéce conservent les 
mémes valeurs quand on intervertit Tordre suivant lequel les différentiations 
relatives aux diverses variables doivent étre effectuées. 

(Cauchy 1823, Résumé, p. 76) 

page 330: On sait que l’évaluation ou méme la réduction des intégrales multiples présente 
généralement de tres grandes difficultés . . . 

(Dirichlet 1839, Werke, vol. I, p. 377) 
page 336: Besonderen Stolz legte Dirichlet auf seine Methode des diskontinuierlichen 
Faktors zur Bestimmung vielfacher Integrale. Er pflegte zu sagen, es ist das 
ein sehr einfacher Gedanke, und schmunzelnd hinzuzuftigen, aber man muss 
ihn haben. (H. Minkowski, Jahrber. DMV, 14 (1905), p. 161) 

page 338: . . . locum håbet pro quacunque alia formula J f Zdxdy, quippe quae per eas- 
dem substitutiones transformatur in hac Jf Z(VR— ST) dtdu . . . 

(Euler 1769b) 
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154f 
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